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SYSTEM AND METHOD FOR ADAPTIVE INTERFERENCE CANCELING 

RELATED A PPT IC ATIONS INCORPORATED BY REFERENCE 

This is a Continuation-In-Part of U.S. Patent Serial No. 09/059,503 filed 
5 August 6, 1998, U.S. Patent Serial No. 08/840,159 filed April 14, 1997, U.S. Patent 
Serial No. 09/055,709 filed April 7, 1998, U.S. Patent Serial No. 09/130,923, filed 
August-6, 1998 and U.S. Patent Serial No. 09/157,035 filed September 18, 1998, each 
of which is hereby incorporated herein by reference. 

The following applications and patent(s) are cited and hereby herein 
10 incorporated by reference: U.S. Patent Serial No. 60/126,567 filed March 26, 1999; 
U.S. Patent Serial No. 09/252,874 filed February 18, 1999, U.S. Patent Serial No. 
09/130,923 filed August 6, 1998, U.S. Patent Serial No. 09/055,709 filed April 7, 1998, 
U.S. Patent Serial No. 09/059,503 filed April 13, 1998, U.S. Patent Serial No. 
08/840,159 filed April 14, 1997, U.S. Patent Serial No. 09/050,196 filed March 30, 
1 5 1 998, U.S. Patent Serial No. 09/252,874 filed February 1 8, 1 999, U.S. Patent Serial No. 
08/672,899 now U.S. Patent No. 5,825,898 issued October 20, 1998; and U.S. Patent 
Serial No. 09/089,710 filed June 3, 1998, U.S. Patent No. 5,825,897 issued October 20, 
1998, U.S. Patent No. 5,732,143 issued March 24, 1998, U.S. Patent No. 5,673,325 
issued September 30, 1997, U.S. Patent No. 5,381,473 issued January 10, 1995, 
20 International Application No. PCT/US 99/06764 filed March 29, 1 999, U.S. patent 
Serial No. 60/126,567 filed March 26, 1999, International Application No. 
PCT/US99/08012 filed April 13, 1999. And, all documents cited herein are 
incorporated herein by reference, as are documents cited or referenced in documents 
cited herein. 

25 

BACKGROUND OE THE INVENTION 

The present invention relates generally to integrating a DSDA (Digital 
Super Directional Array). ' 

SUMMARY OF THE INVENTION 

30 It is an object of the present invention an integrated DSDA. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The objects, features and advantages of the present invention will be 
more readily apparent from the following detailed description of the invention in which: 
FIG. 1 is a block diagram of an overall system; 
5 FIG. 2 is a block diagram of a sampling unit; 

FIG. 3 is a block diagram of an alternative embodiment of a sampling 

unit; 

FIG. 4 is a schematic depiction of tapped delay lines used in a main 
channel matrix and a reference matrix unit; 
10 FIG. 5 is a schematic depiction of a main channel matrix unit; 

FIG. 6 is a schematic depiction of a reference channel matrix unit; 

FIG. 7 is a schematic depiction of a decolorizing filter; 

FIG. 8 is a schematic depiction of an inhibiting unit based on directional 

interference; 

15 FIG. 9 is a schematic depiction of a frequency-selective constraint 

adaptive filter; 

FIG. 10 is a block diagram of a frequency-selective weight-constraint 

unit; 

FIG. 1 1 is a flow chart depicting the operation of a program that can be 
20 used to implement the invention; 

FIGS. 12A-H illustrate the DSDA integrated according to the present 

invention; 

FIGS. 13-1 8C-2 illustrate the universal interface in accordance with the 
present invention; 

25 FIG. 19 is a block diagram of a system using sub-band processing; 

FIG. 20 iVa block diagram of a system using broadband processing with 
frequency-limited adaptation; . 

FIG. 21 is a block diagram of a system using broadband processing with 
an external main-channel generator; 
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FIGS. 22A-22D are a flow chart depicting the operation of a program 
that can be used to implement a method using sub-band processing; 

FIGS. 23A-23C are a flow chart depicting the operation of a program 
that can be used to implement a method using broad-band processing with frequency- 
5 limited adaptation; 

FIGS. 24A-24C are a flow chart depicting the operation of a program 
that can be used to implement a method using broad-band processing with an external 
main-channel generator; 

FIG. 25 is a functional diagram of the overall system including a 
10 microphone array, an A-to-D converter, a band-pass filter, an approximate-direction 
finder, a precise-direction finder, and a measurement qualification unit in accordance 
with the present invention; 

FIG. 26 is a perspective view showing the arrangement of a particular 
embodiment of the microphone array of FIG. 25; 

FIG. 27 is a functional diagram of an embodiment of the approximate 
and exact direction finder of FIG. 25; 

FIG. 28 is a functional diagram of an embodiment of the precise- 
direction finder of FIG. 25; 

FIG. 29 is an exact-direction finder of FIG. 25; 

20 FIG - 30 ^ the 3-D coordinate system used to describe the present 

invention; 

FIG. 31 A is a functional diagram of a first embodiment of the 
measurement qualification unit of FIG. 25; 

FIG. 31 B is a functional diagram of a second embodiment of the 
25 measurement qualification unit of FIG. 25; 

FIG. 31 C is a functional diagram of a third embodiment of the 
measurement qualification unit of FIG. 25: 

FIG. 3 ID is a functional diagram of a fourth embodiment of the 
measurement qualification unit of FIG. 25; 



15 
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FIGS. 32A - 32D are a flow chart depicting the operation of a program 
that can be used to implement the method in accordance with the present invention; and 
FIGS. 33A-33J are diagrams of the present invention. 

5 DETAILED DESCRIPTION OF THE INVENTION 

FIG. 1 is a block diagram of a system in accordance with a preferred 
embodiment of the present invention. The system illustrated has a sensor array 1, a 
sampling unit 2, a main channel matrix unit 3, a reference channel matrix unit 4, a set of 
decolorizing filters 5, a set of frequency-selective constrained adaptive filters 6, a delay 
10 7, a difference unit 8, an inhibiting unit 9, and an output D/A unit 10. 

Sensor array 1, having individual sensors la-Id, receives signals from a 
signal source on-axis from the system and from interference sources located off-axis 
from the system. The sensor array is connected to sampling unit 2 for sampling the 
received signals, having individual sampling elements, 2a-2d, where each element is 
15 connected to the corresponding individual sensor to produce digital signals 11. 

The outputs of sampling unit 2 are connected to main channel matrix unit 
3 producing a main channel 12 representing signals received in the direction of a 
source. The main channel contains both a source signal component and an interference 
signal component. 

20 The outputs of sampling unit 2 are also connected reference channel 

matrix unit 4, which generates reference channels 13 representing signals received from 
directions other that of the signal source. Thus, the reference channels represent 
interference signals. 

The reference channels are filtered through decolorizing filters 5, which 

25 generate flat- frequency reference channels 14 having a frequency spectrum whose 
magnitude is substantially flat over a frequency range of interest. Flat- frequency 
reference channels 14 are fed into the set of frequency-selective constraint adaptive 
filters 6, which generate canceling signals 15. 

In the mean time, main channel 12 is delayed through delay 7 so that it is 

30 synchronized with canceling signals 15. Difference unit 8 then subtracts canceling 
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signals 15 from the delayed main channel to generate an digital output signal 16, which 
is converted by D/A unit 10 into analog form. Digital output signal 15 is fed back to the 
adaptive filters to update the filter weights of the adaptive filters. Flat-frequency 
reference channels 14 are fed to inhibiting unit 9, which estimates the power of each 
5 flat-frequency reference channel as well as the power of the main channel and generates 
an inhibit signal 19 to prevent signal leakage. 

FIG. 2 depicts a preferred embodiment of the sampling unit. A sensor 
array 21, having sensor elements 21a-21d, is connected to an analog front end 22, 
having amplifier elements 22a-22d, where each amplifier element is connected to the 
10 output of the corresponding sensor element. In a directional microphone application, 
each sensor can be either a directional or omnidirectional microphone. The analog front 
end amplifies the received analog sensor signals to match the input requirement of the 
sampling elements. The outputs from the analog front ends are connected to a set of 
delta-sigma A/D converters, 23, where each converter samples and digitizes the 
15 amplified analog signals. The delta-sigma sampling is a well-known A/D technique 
using both oversampling and digital filtering. For details on delta-sigma A/D sampling, 
see Crystal Semiconductor Corporation, Application Note: Delta-Sigma Techniques, 
1989. 

FIG. 3 shows an alternative embodiment of the sampling unit. A sensor 
20 array 31, having sensor elements 31a-31d, is connected to an amplifier 32, having 
amplifier elements 32a-32d, where each amplifier element amplifies the received 
signals from the corresponding sensor element. The outputs of the amplifier are 
connected to a sample & hold (S/H) unit 33 having sample & hold elements 33a-33d, 
where each S/H element samples the amplified analog signal from the corresponding 
25 amplifier element to produce a discrete signal. The outputs from the S/H unit are 

multiplexed into a single signal through a multiplexor 34. The output of the multiplexor 
is connected to a conventional A/D converter 35 to produce a digital signal. 

FIG. 4 is a schematic depiction of tapped delay lines used in the main 
channel matrix unit and the reference channel matrix in accordance with a preferred 
30 embodiment of the present invention. The tapped delay line used here is defined as a 
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nonrecursive digital filter, also known in the art as a transversal filter, a finite impulse 
response filter or an FIR filter. The illustrated embodiment has 4 tapped delay lines, 
40a-40d. Each tapped delay line includes delay elements 41, multipliers 42 and adders 
43. Digital signals, 44a-44d, are fed into the set of tapped delay lines 40a-40d. Delayed 
5 signals through delay elements 41 are multiplied by filter coefficients, Fij, 45 and added 
to produce outputs, 46a-46d. 

The n-th sample of an output from the i-th tapped delay line, Yi(n), can 
then be expressed as: 

Yj(n) = £ k j= 0 Fjj Xj(n-j), where k is the length of the filter, and X*(n) is 
the n-th sample of an input to the i-th tapped delay line. 

FIG. 5 depicts the main channel matrix unit for generating a main 
channel in accordance with a preferred embodiment of the present invention. The unit 
has tapped delay lines, 50a-50d, as an input section taking inputs 51a-51d from the 
sampling unit. Its output section includes multipliers, 52a-52d, where each multiplier is 
connected to the corresponding tapped delay line and an adder 53, which sums all 
output signals from the multipliers. The unit generates a main channel 54, as a 
weighted sum of outputs from all multipliers. The filter weights 55a-55d can be any 
combination of fractions as long as their sum is 1 . For example, if 4 microphones are 
used, the embodiment may use the filter weights of 1/4 in order to take into account of 
the contribution of each microphone. 

The unit acts as a beamformer, a spatial filter which filters a signal 
coming in all directions to produce a signal coming in a specific direction without 
physically moving the sensor array. The coefficients of the tapped delay lines and the 
filter weights are set in such a way that the received signals are spatially filtered to 
maximize the sensitivity toward the signal source. 

Since some interference signals find their way to reach the signal source 
due to many factors such as the reverberation of a room, main channel 54 representing 
the received signal in the direction of the signal source contains not only a source signal 
component, but also an interference signal component. 
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FIG. 6 depicts the reference channel matrix unit for generating reference 
matrix channels in accordance with a preferred embodiment of the present invention. It 
has tapped delay lines, 60a-60d, as an input section taking inputs 61a-61d from the 
sampling unit. The same tapped delay lines as that of FIG. 4 may be used, in which 
5 case the tapped delay lines may be shared by the main and reference channel matrix 
units. 

Its output section includes multipliers, 62a-62d, 63a-63d, 64a-64d and 
adders 65a-65c, where each multiplier is connected to the corresponding tapped delay 
line and adder. The unit acts as a beamformer which generates the reference channels 

10 66a-66c representing signals arriving off-axis from the signal source by obtaining the 
weighted differences of certain combinations of outputs from the tapped delay lines. 
The filter weight combinations can be any numbers as long as their sum of filter 
weights for combining a given reference channel is 0. For example, the illustrated 
embodiment may use a filter weight combination, (Wl 1, W12, W13, W14) = (0.25, 

15 0.25, 0.25, -0.75), in order to combine signals 61 a-61d to produce reference channel 
66a. 

The net effect is placing a null (low sensitivity) in the receiving gain of 
the beamformer toward the signal source. As a result, the reference channels represent 
interference signals in directions other than that of the signal source. In other words, 
20 the unit "steers" the input digital data to obtain interference signals without physically 
moving the sensor array. 

FIG. 7 is a schematic depiction of the decolorizing filter in accordance 
with a preferred embodiment of the present invention. It is a tapped delay line 
including delay elements 71, multipliers 72 and adders 73. A reference channel 74 is 
25 fed into the tapped delay line. Delayed signals are multiplied by filter coefficients, F„ 
75 and added to produce an output 76. The filter coefficients are set in such a way that 
the filter amplifies the low-magnitude frequency components of an input signal to 
obtain an output signal having a substantially flat frequency spectrum. 

As mentioned before in the background section, the output of a 
30 conventional adaptive beamformer suffers a non-uniform frequency behavior. This is 
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because the reference channels do not have a flat frequency spectrum. The receiving 
sensitivity of a beamformer toward a particular angular direction is often described in 
terms of a gain curve. As mentioned before, the reference channel is obtained by 
placing a null in the gain curve (making the sensor array insensitive) in the direction of 
5 the signal source. The resulting gain curve has a lower gain for lower frequency signals 
than higher frequency signals. Since the reference channel is modified to generate a 
canceling signal, a non-flat frequency spectrum of the reference channel is translated to 
a non-uniform frequency behavior in the system output. 

The decolorizing filter is a fixed-coefficient filter which flattens the 

10 frequency spectrum of the reference channel (thus "decolorizing" the reference channel) 
by boosting the low frequency portion of the reference channel. By adding the 
decolorizing filters to all outputs of the reference channel matrix unit, a substantially 
flat frequency response in all directions is obtained. 

The decolorizing filter in the illustrated embodiment uses a tapped delay 

15 line filter which is the same as a finite impulse response (FIR) filter, but other kinds of 
filters such as an infinite impulse response (IIR) filter can also be used for the 
decolorizing filter in an alternative embodiment. 

FIG. 8 depicts schematically the inhibiting unit in accordance with a 
preferred embodiment of the present invention. It includes power estimation units 81, 

20 82 which estimate the power of a main channel 83 and each reference channel 84, 

respectively. A sample power estimation unit 85 calculates the power of each sample. 
A multiplier 86 multiplies the power of each sample by a fraction, a, which is the 
reciprocal of the number of samples for a given averaging period to obtain an average 
sample power 87. An adder 88 adds the average sample power to the output of another 

25 multiplier 89 which multiplies a previously calculated main channel power average 90 
by (1-a). A new main channel power average is obtained by (new sample power) x a + 
(old power average) x (1-a). For example, if a 100-sample average is used, a = 0.01. 
The updated power average will be (new sample power) x 0.01 + (old power average) x 
0.99. In this way, the updated power average will be available at each sampling instant 

30 rather than after an averaging period. Although the illustrated embodiment shows an 



WO 01/31972 



PCT/US00/29336 



9 

on-the-fly estimation method of the power average, other kinds of power estimation 
methods can also be used in an alternative embodiment. 

A multiplier 91 multiplies the main channel power 89 with a threshold 

92 to obtain a normalized main channel power average 93. An adder 94 subtracts 

5 reference channel power averages 95 from the normalized main channel power average 

93 to produce a difference 96. If the difference is positive, a comparator 97 generates 
an inhibit signal 98. The inhibit signal is provided to the adaptive filters to stop the 
adaptation process to prevent signal leakage. 

Although the illustrated embodiment normalizes the main channel power 
10 average, an alternative embodiment may normalize the reference channel power average 
instead of the main channel power average. For example, if the threshold 92 in the 
illustrated embodiment is 0.25, the same effect can be obtained in the alternative 
embodiment by normalizing each reference channel power average by multiplying it by 
4. 

1 5 This inhibition approach is different from the prior art SNR-based 

inhibition approach mentioned in the background section in that it detects the presence 
of significant directional interference which the prior art approach does not consider. 
As a result, the directional-interference-based inhibition approach stops the adaptation 
process when there is no significant directional interference to be eliminated, whereas 

20 the prior art approach does not. 

For example, where there is a weak source signal (e.g. during speech 
intermission) and there is almost no directional interference except some uncorrelated 
noise (such as noise due to wind or mechanical vibrations on the sensor structure), the 
SNR-based approach would allow the adaptive filter to continue adapting due to the 

25 small SNR. The continued adaptation process is not desirable because there is very 
little directional interference to be eliminated in the first place, and the adaptation 
process searches in vain for new filter weights to eliminate the uncorrelated noise, 
which often results in canceling the source signal component of the received signal. 

By contrast, the directional-interference-based inhibition mechanism will 

30 inhibit the adaptation process in such a case because the strength of directional 
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interference as reflected in the reference channel power average will be smaller than the 
normalized main channel power average, producing a positive normalized power 
difference. The adaptive process is inhibited as a result until there is some directional 
interference to be eliminated. . 
5 FIG. 9 shows the frequency-selective-constraint adaptive filter together 

with the difference unit in accordance with a preferred embodiment of the present 
invention. The frequency-selective constraint adaptive filter 101 includes a finite 
impulse response (FIR) filter 102, an LMS weight updating unit 103 and a frequency- 
selective weight-constraint unit 104. In an alternative embodiment, an infinite impulse 

10 response (IIR) filter can be used instead of the FIR filter. A flat- frequency reference 

channel 105 passes through FIR filter 102 whose filter weights are adjusted to produce a 
canceling signal 106 which closely approximates the actual interference signal 
component present in a main channel 107. In a preferred embodiment, the main 
channel is obtained from the main channel matrix unit after a delay in order to 

1 5 synchronize the main channel with the canceling signal. In general, there is a delay 
between the main channel and the canceling signal because the canceling signal is 
obtained by processing reference channels through extra stages of delay, i.e., the 
decolorization filters and adaptive filters. In an alternative embodiment, the main 
channel directly from the main channel matrix unit may be used if the delay is not 

20 significant. 

A difference unit 108 subtracts canceling signal 106 from main channel 
107 to generates an output signal 109. Adaptive filter 101 adjusts filter weights, Wj- 
W n , to minimize the power of the output signal. When the filter weights settle, output 
signal 109 generates the source signal substantially free of the actual interference signal 

25 component because canceling signal 1 06 closely tracks the interference signal 

component. Output signal 109 is sent to the output D/A unit to produce an analog 
output signal. Output signal 109 is also used to adjust the adaptive filter weights to 
further reduce the interference signal component. 

There are many techniques to continuously update the values of the filter 

30 weights. The preferred embodiment uses the Least Mean-Square (LMS) algorithm 
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which minimize the mean-square value of the difference between the main channel and 
the canceling signal, but in an alternative embodiment, other algorithms such as 
Recursive Least Square (RLS) can also be used. 

Under the LMS algorithm, the adaptive filter weights are updated 
5 according to the following: 

W p (n+1) = W p (n) + 2 u r(n-p) e(n) 
where n is a discrete time index; W p is a p-th filter weight of the adaptive filter; e(n) is a 
difference signal between the main channel signal and the canceling signal; r(n) is a 
reference channel; and u is an adaptation constant that controls the speed of adaptation. 
10 FIG - 1 0 depicts a preferred embodiment of the frequency-selective 

weight-constraint unit. The frequency-selective weight-control unit 1 10 includes a Fast 
Fourier Transform (FFT) unit 1 12, a set of frequency bins 1 14, a set of truncating units 
1 15, a set of storage cells 11 6, and an Inverse Fast Fourier Transform (IFFT) unit 1 17, 
connected in series. 

1 5 7,16 FFT un »t 1 1 2 receives adaptive filter weights 1 1 1 and performs the 

FFT of the filter weights 1 1 1 to obtain frequency representation values 1 13. The 
frequency representation values are then divided into a set of frequency bands and 
stored into the frequency bins 1 14a-l 14h. Each frequency bin stores the frequency 
representation values within a specific bandwidth assigned to each bin. The values 

20 represent the operation of the adaptive filter with respect to a specific frequency 

component of the source signal. Each of the truncating units 1 15a-l 15h compares the 
frequency representation values with a threshold assigned to each bin, and truncates the 
values if they exceeds the threshold. The truncated frequency representation values are 
temporarily stored in 1 16a-l 16h before the IFFT unit 117 converts them back to new 
25 filter weight values 1 1 8. 

In addition to the inhibiting mechanism based on directional interference, 
the frequency-selective weight-constraint unit further controls the adaptation process 
based on the frequency spectrum of the received source signal. Once the adaptive filter 
starts working, the performance change in the output of the filter, better or worse, 
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becomes drastic. Uncontrolled adaptation can quickly lead to a drastic performance 
degradation. 

The weight-constraint mechanism is based on the observation that a large 
increase in the adaptive filter weight values hints signal leakage. If the adaptive filter 
5 works properly, there is no need for the filter, to increase the filter weights to large 

values. But, if the filter is not working properly, the filter weights tend to grow to large 
values. . 

One way to curve the growth is to use a simple truncating mechanism to 
truncate the values of filter weights to predetermined threshold values. In this way, 
10 even if the overall signal power may be high enough to trigger the inhibition 

mechanism, the weight-constraint mechanism can still prevent the signal leakage. 

For narrow band signals, such as a speech signal or a tonal signal, having 
their power spectral density concentrated in a narrow frequency range, signal leakage 
may not be manifested in a large growth of the filter weight values in the time domain. 
15 However, the filter weight values in the frequency domain will indicate some increase 
because they represent the operation of the adaptive filter in response to a specific 
frequency component of the source signal. The frequency-selective weight-constraint 
unit detects that condition by sensing a large increase in the frequency representation 
values of the filter weights. By truncating the frequency representation values in the 
20 narrow frequency band of interest and inverse-transforming them back to the time 
domain, the unit acts to prevent the signal leakage involving narrow band signals. 

The system described herein may be implemented using commercially 
available digital signal processing (DSP) systems such as Analog Device 2100 series. 

FIG. 1 1 shows a flow chart depicting the operation of a program for a 
25 DSP processor in accordance with a preferred embodiment of the present invention. 

After the program starts at step 100, the program initializes registers and 
pointers as well as buffers (step 1 10). The program then waits for an interrupt from a 
sampling unit requesting for processing of samples received from the array of sensors 
(step 120). When the sampling unit sends an interrupt (step 131) that the samples are 
30 ready, the program reads the sample values (step 1 30) and stores the values (step 140). 
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The program filters the stored values using a routine implementing a tapped delay line 
and stores the filtered input values (step 141). 

The program then retrieves the filtered input values (step 151) and main 
channel matrix coefficients (step 152) to generate a main channel (step 150) by 
multiplying the two and to store the result (step 160). 

The program retrieves the filtered input values (step 171) and reference 
channel matrix coefficients (step 172) to generate a reference channel (reference 
channel #1) by multiplying the two (step 170) and to store the result (step 180). Steps 
1 70 and 180 are repeated to generate all other reference channels (step 190). 

The program retrieves one of the reference channels (step 201) and 
decolorization filter coefficients for the corresponding reference channel (step 202) to 
generate a flat-frequency reference channel by multiplying the two (step 200) and stores 
the result (step 210). Steps 200 and 210 are repeated for all other reference channels 
(step 220). 

The program retrieves one of the flat-frequency reference channels (step 
231) and adaptive filter coefficients (step 232) to generate canceling signal (step 230) 
by multiplying the two and to store the result (step 240). Steps 230 and 240 are 
repeated for all other reference channels to generate more canceling signals (step 250). 



The program retrieves canceling signals (steps 262-263) to subtract them 
from the main channel (retrieved at step 261) to cancel the interference signal 
component in the main channel (step 260). The output is send to a D/A unit to 
reproduce the signal without interference in analog form (step 264). The output is also 
stored (step 270). 

The program calculates the power of a reference channel sample (step 
281) and retrieves an old reference channel power average (step 282). The program 
multiplies the sample power by a and the old power average by (1-a), and sums them 
(step 280), and stores the result as a new power average (step 290). This process is 
repeated for all other reference channels (step 300) and the total sum of power averages 
30 of all reference channels is stored (step 3 1 0). 
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The program multiplies the power of a main channel sample (retrieved at 
step 321) by a and an old main channel power average (retrieved at step 322) by (1-a), 
sums them (step 320) and stores them as a new main channel power average (step 330). 

The program then multiplies the main channel power with a threshold to 
5 obtain a normalized main channel power average (step 340). The program subtracts the 
total reference channel power average (retrieved at step 341) from the normalized main 
channel power average to produce a difference (step 350). If the difference is positive, 
the program goes back to step 120 where it simply waits for another samples. 

If the difference is negative, the program enters a weight-updating 
10 routine. The program calculates a new filter weight by adding [2 x adaptation constant 
x reference channel sample (retrieved at step 361) x output (retrieved at step 362)] to an 
old filter weight (retrieved at step 363) to update the weight (step 360) and stores the 
result (step 370). 

The program performs the FFT of the new filter weights to obtain their 

15 frequency representation (step 380). The frequency representation values are divided 
into several frequency bands and stored into a set of frequency bins (step 390). The 
frequency representation values in each bin are compared with a threshold associated 
with each frequency bin (step 400). If the values exceed the threshold, the values are 
truncated to the threshold (step 410). The program performs the IFFT to convert the 

20 truncated frequency representation values back to filter weight values (step 420) and 
stores them (step 430). The program repeats the weight-updating routine, steps 360- 
430, for all other reference channels and associated adaptive filters (step 440). The 
program then goes back to step 1 20 to wait for an interrupt for a new round of 
processing samples (step 450). 

25 The microphone array of the present invention may be embodied as a 

digital super directional array™ (DSDA) 120 shown in Fig. 12 A. The DSDA 120 
shown in Fig. 12A is formed of a substantially cylindrical housing or "wand" which is 
elongated in one direction with the microphone elements of the array arranged therein 
and aligned with slats or any other suitable spacing for allowing sound to be received by 

30 the microphones in the array. It will be appreciated that the DSDA 120 may be 
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incorporated into a keyboard 122 as shown in Fig. 12B, an automobile visor 124 as 
shown in Fig. 12C, an automobile mirror 126 as shown in Fig. 12D, a mouse 128 as 
shown in Fig. 12E or a video camera 130 as shown in Fig. 12F. 

5 The keyboard 1 22 shown in Fig. 1 2B incorporates the DSDA 1 20 therein 

with the microphones aligned with slats or any other suitable spacing for allowing 
sound to be received by the microphone. The DSDA processing may be performed by 
hardware implementing the adaptive beamforming technique of the present invention or 
may couple the microphone array signals to a computer (not shown) through the serial 
1 0 keyboard port, COM port, LPT port, USB port or other suitable means such as radio 
frequency or infra-red transmission. In the computer implementation, the adaptive 
beamforming technique is performed by software installed in the computer. The DSDA 
may be flush with the keyboard so as not to be distinguishable or may be formed within 
a raised portion which serves to position the DSDA closer to the computer user's mouth. 
1 5 The raised portion may be elongated in a direction toward a position commensurate 
with a typical position of the computer user's mouth such as directly in front of and 
above the keyboard. Alternatively, the DSDA may be housed in a boom or a wand 
which is either attached to the computer at one end fixedly or non-fixedly such as a 
hinge or may be coupled by a connecting wire. Alternatively, the DSDA may be 
20 wirelessly coupled to the keyboard by any suitable wireless transmission means. In the 
last instance, the keyboard may include a receiving platform such as a stand or a 
depression for receiving the DSDA whereby the DSDA is removed by the user from the 
receiving platform and either spoken into like a hand-held microphone or placed by the 
user in any convenient location. 
25 Fig. 1 2B further shows that the DSDA array may be configured as a 

microphone at one or more comers of the keyboard including all four corners. In 
addition, the DSDA may be configured as a plurality of microphones in any or all of the 
corners of the keyboard such as two or four microphones in each corner. The DSDA 
may be embodied integrally with the keyboard such as flip-up style accessory built into 
30 the keyboard which otherwise is unnoticeable when concealed within the keyboard and, 
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when tilted upward, advances beyond the surface of the keyboard to expose the 
microphone array. 

Fig. 12B further illustrates the DSDA integrated into the bottom of the 
keyboard with the slats or other means for allowing sound to enter the microphones 
5 adjacent thereto to receive audio signals. With this arrangement, there is provided a 
small gap between the bottom of the keyboard and a supporting surface such as a 
desktop which creates a pressure zone microphone effect between the DSDA facing 
downward and the supporting surface which minimizes acoustic reflections to achieve 
direct sound reception by the DSDA. In at least one embodiment, the DSDA is 

10 configured on the bottom of the keyboard adjacent or at the leading edge. The instant 
invention takes into account the tendency of digital processing to fail to remove 
acoustic reflections caused by audible sounds reflecting off surfaces in a room or other 
objects therein. This so-called "hollow effect" is minimized in the present invention by 
providing the DSDA beneath the keyboard whereby the acoustic reflections are 

15 minimized due to the slight air gap between the DSDA and the supporting surface 
which creates the pressure zone effect. It will be appreciated that the pressure zone 
microphone in accordance with the present invention may be created with any 
peripheral, including those shown in Figs. 12A-F by forming the DSDA between the 
peripheral and a supporting surface. 

20 Fig. 1 2C illustrates a further embodiment of the DSDA which is housed 

in a substantially flat housing 134 having two substantially parallel sides elongated in 
one direction with one side including slats or other suitable means for allowing sound to 
be received by the microphones arranged adjacent thereto. It will be appreciated that 
the DSDA has a substantially flat profile such that the DSDA may fit snugly between an 

25 upper surface of the visor 124 of the automobile and a ceiling of the interior of the 
automobile. A holding member 136 may be included for holding the DSDA which 
includes a pair of opposed pincer-like arms 138 which receive and hold therein the 
DSDA. It is possible that each arm includes a distal portion formed such that a spacing 
between opposed distal portions is slightly less than a width of the DSDA and a spacing 

30 between opposed arms is substantially the same width as the DSDA. The DSDA is 
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inserted into said holding member by forcibly inserting the DSDA between said pair of 
opposed arm causing the opposed arms to be slightly separated such that the DSDA is 
slid therebetween and afterward the DSDA is snapped in place in an area formed 
between the opposed arms. Alternatively, the DSDA is slid into the area formed 
between the opposed arms from one side. In this last instance, the DSDA housing may 
be formed with a longitudinal groove which meets a protruding portion of the housing 
member such as the distal portion of one or more of the arms or a nodule formed in said 
housing specifically constructed for meeting the groove for holding the DSDA within 
the housing. 

The DSDA may be affixed to the visor, or any portion of the automobile 
for that matter, by use of suction cups such as on the window or the dashboard, 
magnetic strips formed on the DSDA and the surface where the DSDA is to be mounted 
or Velcro™, for example. In addition, hooking members 132 substantially formed in a 
shape resembling clothespins may be provided for hooking the DSDA or holding 
member which holds the DSDA to the visor whereby the hooking members include a 
spacing between flexibly rigid opposed members which are curved to provide 
increasing resistance when spread apart. In one embodiment, one side of the hooking 
member engages an opening or slot 140 formed within the holding member while the 
opposite side of the hooking member engages the bottom of the visor such that the visor 
is sandwiched between the opposed sides of the hooking member firmly enough such 
that the visor can be swung open and closed without the hooking members losing the 
holding member or the DSDA held therein. 

It will be appreciated that the DSDA coupled to the visor of an 
automobile creates a small air gap between the DSDA and the ceiling of the automobile 
thereby generating a pressure zone effect and minimizing acoustic reflections within the 
automobile. 

Fig. 12D illustrates the DSDA integrated into a rear-view mirror 126 of 
an automobile. In this instance, the microphones of the array are spaced along the rim 
of the rear-view mirror. It will be appreciated that this microphone arrangement has 
similar properties to the array shown in Fig. 26. A flexible, tubular housing may be 
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provided for housing the microphones in the array such that the tubular-housing may be 
applied by the driver by fitting the tubular housing around the rear-view mirror for ease 
of installation. In the alternative, the DSDA may be provided in a long wand-like 
member attached to the rear-view mirror 126. The processing hardware for processing 
5 the sound received by the microphone array may be incorporated into the interior of the 
rear-view mirror or, alternatively, transmitted via wired or wireless transmission means 
to a remote processor within the automobile. It is within the scope of the invention to 
provide, as a separate processor or integrated with the processor for processing the 
audio signal, a processor for controlling automobile components such as the radio, 

1 0 earphone or global positioning satellite navigation system, for example, in accordance 
with the processed sound received by the DSDA. 

Fig. 12E illustrates the DSDA integrated within a mouse 128 wherein the 
microphones of the microphone array are disposed adjacent slats or other means for 
suitably allowing sound to be received by the microphones. The mouse is otherwise a 

15 standard mouse except for the DSDA. However, additional features may be linked to 
the mouse keys which effect array performance such as array volume, beam direction, 
setting an array type, array tuning, etc. As with the other embodiments of the DSDA, 
the mouse may be wired or wirelessly connected to the DSDA processing circuitry 
and/or a personal computer. 

20 Fig. 12F illustrates the DSDA integrated with a video camera such as 

that used for video teleconferencing over, for example, the internet. In this instance, the 
DSDA may be incorporated as a peripheral to the video camera which, by coupling 
means comprising such as a microphone plug, the DSDA is coupled to the video camera 
by, for example, a holding member or Velcro™. In the alternative, the DSDA in this 

25 embodiment may be incorporated into the video camera. As with the other 

embodiments of the DSDA, the DSDA may include wireless transmission to the video 
camera such that the video operator may place the DSDA in the vicinity of the talent, 
such as an actor or actress, and record the scene from a distance. 

Fig. 12G illustrates the noise canceling stethoscope of the present 

30 invention which incorporates the DSDA. The noise canceling stethoscope is 
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incorporated herein by reference to U.S. Patent Serial No. 08/963,164 filed November 
4, 1997 (now U.S. Patent No. 5,909,495 issued Juen 1, 1999) which one skilled in the 
art will appreciate may be integrated with the spectral substraction techniques herein 
described and/or microphone array technology. Thus, the present invention is 
applicable to medical applications including ultrasound for canceling noise when 
reading ultrasound vibrations echoed in a body and retrieved for reconstruction on a 
display of the portion of the body including, for example, ultrasound examinations for 
imaging fetuses. It will be instantly recognized that removing noise in such medical 
applications from either sound received by the noise canceling stethoscope or 
ultrasound advantageously cancels noise, thereby providing improved audio signals for 
the noise canceling stethoscope or for imaging in ultrasound. It will be appreciated that 
the DSDA or microphone may be incorporated in the noise canceling stethoscope or 
ultrasound device and/or the hardware/software for processing the audio to remove the 
noise may be incorporated in those devices as well. 

In addition, the DSDA of the present invention is incorporated into the 
remote control and keypad for the set top box as illustrated in Fig. 12H. The remote 
control, as described in copending US. Appln. Ser. No. 09/050,196 (filed March 30, 

1998) and copending international application PCT/US99/06764 (filed March 29, 

1999) , both incorporated herein by reference, is operable to input textual data on 
numeric and/or alphabetic keypad(s) and includes the ability to wirelessly transmit 
speech commands received by the microphone (DSDA) to the set top box. The UVI 
may be incorporated into the remote control to interface the speech signals received by 
the DSDA or the remote control may include the noise cancellation/reduction 
processing herein described. 

In the alternative, the set top box may include the speech processing. 
The set top box as described interfaces to a television to provide both operation of the 
television and external sources such as cable services, internet or other on-line service. 
To that end, the set top box may incorporate the processing ability for supporting 
internet access. 
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Fig. 13 illustrates the universal voice interface 142 which may embody 
the DSDA. It will be appreciated that any type of microphone may be incorporated as 
UVI, including a dieletret, a stereo, unidirectional or multi-directional microphone. The 
received audio signals are transferred from the universal voice interface by any 
5 appropriate means including wired or wireless transmission such as infrared or radio 
frequency transmission. It will be appreciated that the universal voice interface may 
comprise the microphone by itself or include interface circuitry such as analog-to- 
digital converters and a multiplexor for interface into a computer processor. The audio 
signals are received by any known communication port of a computer including the 

10 serial or parallel port or for that matter the USB port. A device driver may be included 
for the driving the processor 144 or the processor itself may strobe the appropriate port 
register for the audio signals converted into digital data. It will also be appreciated that 
the audio signals may be input to a sound card installed in the computer and then 
forwarded by the appropriate device driver to the processor 144. On the other hand, 

15 either the device driver or the sound card may provide the processing circuitry or 

software for processing the audio signal to remove noise. In any case, the audio signals 
are signal processed to remove the noise in accordance with the adaptive beam forming 
techniques described herein. In addition, or in the alternative, the audio processing may 
include noise cancellation which inverts the noise portion of the signal, extracted by a 

20 separate microphone or spectral processing subtraction, and subtracted from the main 
reference signal. 

Fig. 14 illustrates the universal voice interface circuitry. Of course, the 
universal voice interface may be provided by software. In this example, the universal 
voice interface is coupled to the DSDA incorporated around the rim of the rear-view 

25 mirror 126. It will be appreciated that other microphone arrangements may be 

incorporated with the universal voice interface circuitry such as those illustrated in Figs. 
1 2 A-F. The universal voice interface circuitry may be incorporated into the 
microphone arrangement or coupled thereto by any appropriate transmission means 
including wired or wireless transmission. The UVI interfaces the analog audio signals 

30 received by the microphone arrangement to a digital processor such as that found in a 
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personal computer and, therefore, includes an analog-to-digital converter series 146, 
each A/D converter in the series corresponding to a microphone in the DSDA. Of ' 
course, a single analog-to-digital converter may be provided where, for example, a 
single microphone element is employed. In this instance, the A/D converters 146 are 
driven by a 44KHz clock controlled by the microprocessor 150; but the clock may be of 
any clock speed corresponding to the processor speed of the system. Also in this 
example, the A/D converters 146 output 16-bit samples; but the sample size may vary 
for different systems or application. The digitized samples are coupled to a multiplexer 
148 where they are multiplexed at the control of the microprocessor in a predetermined 
order and forwarded to the microprocessor 150. The figure shows that the samples are 
forwarded to the microprocessor on a 16-bit channel; but the channels may be of any 
band width including a bit stream. The microprocessor may include processing 
hardware/software for processing the audio samples in a format which agrees with the 
later digital speech processor. In addition, the microprocessor 150 may provide the 
audio processing such as the adaptive beam forming techniques or noise cancellation 
techniques herein described. This example illustrates that the microprocessor 150 is 
manufactured as an application-specific integrated circuit (ASIC); but the present 
invention also may be practiced as other IC structures. There is provided a dedicated 
adaptive filter 152 for assisting the microprocessor 150 in the adaptive beam forming 
techniques herein described. The processed audio signal is forwarded to a digital 
speech processor, such as the speech recognition unit which recognizes and controls 
audio driven components. 

Fig. 15 shows the adaptive beam forming technique of the present 
invention incorporated in the UVI. In the example shown in Fig. 1 5, the audio signals 
received by each microphone in the DSDA is received by the A/D converter 146 and, 
once digitally converted, are coupled to corresponding band pass filters 154 which act 
as digital samplers. It will be appreciated that the A/D converter 146 may also act as a 
band pass filter whose filtering characteristic is controlled by the clock rate which 
operates the A/D converter. A direction calculation unit 156 is provided for calculating 
the direction of an audio sound source which drives the band pass filters to steer the 
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direction in which sound is primarily received in accordance with the direction 
calculated by the direction calculation unit 156. The main channel matrix 158 is 
provided for receiving the signals in the main channel of the beam formed by the 
direction calculation unit 156 in accordance with weights provided from the direction 
5 calculation unit. The reference channel matrix 160 is provided for receiving the audio 
signals which are substantially not within the beam formed by the direction calculation 
unit 156. It will be appreciated that the direction calculation unit is controlled by the 
system controller 172. Down converters 162, 164 are provided to down convert the 
signals received by the main and reference channels respectively. The dedicated 

10 adaptive filter 1 66 adaptively processes the audio signals in accordance with the 

adaptive beam forming techniques described herein. An arithmetic logic unit 168 is 
provided for subtracting from the main channel the adaptively formed reference channel 
noise as controlled by the system controller 172. The resulting substantially noise-free 
signal is provided to a multiplexer 170 which multiplexes the noise- free signal with the 

15 main channel signal as controlled by the signal controller 172. In this manner, the 

system controller 172 controls the multiplexer 170 to either select the main channel or 
the channel with noise removed. It is shown in the figure that the multiplexer is a four 
input multiplexer; however, any other equivalent means may be provided for selecting 
between the signals. The multiplexed signal is output to a digital speech processor such 

20 as a speech' recognition processor which recognizes speed and controls in response 
thereto audio/driven components. 

Figs. 16 and 17 illustrate the operation of the universal voice interface. 
In step 174 the system is reset. Control advances to step 176 wherein the direction 
calculation unit is enabled. A determination is made in step 178 whether the direction 

25 calculation is in error and advances to step 1 80 where a system alarm is raised if the 

answer to the determination is in the affirmative. Otherwise, the direction calculation is 
correct and control advances to step 1 82 wherein a direction result is awaited. When 
the direction result is obtained, control advances to step 184 wherein the main end 
reference channel weights are set for the respective main and reference channel 

30 matrices. Control advances to step 186 wherein the system awaits a ready signal and, 
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upon receiving the ready signal, step 1 88 enables the down conversion. The operation 
is further described in Fig. 17, wherein step 190 enables the dedicated adaptive filter. If 
a filter error is detected in step 192, and, if it is determined that a filter error occurs, an 
alarm is raised and the system is reset in step 196. In step 194, the filtered result is 
awaited and, upon receiving the filtered result, the arithmetic logic unit is enabled in 
step 198. If the ALU commits an error as determined by step 200, an alarm is raised 
and the system is reset in step 210. Otherwise, the multiplexer is enabled in step 212. 

Fig. 18 shows the universal voice interface incorporated with a computer 
monitor wherein the microphones of the DSDA array are situated around the perimeter 
of the face of the monitor facing the computer user. In this example, the UVI 
incorporates A/D converters 146 which digitize the audio signals received from the 
respective microphones in the array. In at least one embodiment, the A/D converters 
1 46 are incorporated into the body of a standard personal computer plug which plugs 
into any standard type of personal computer port. In addition, it is within the scope of 
the present invention to power the analog-to-digital converter 146 through the power 
pin of the personal computer port. In the example shown in the figure, the plug is an 
RS-232 Interface/Parallel Port Plug which plugs into a corresponding RS-232/Parallel 
port. In the example, there is shown a monitor; however, it is well within the scope of 
the present invention to incorporate the microphone arrangement which include either 
the microphone array or the several types of microphones described in virtually any 
fixture or appliance such as those shown in Figs. 12A-F. 



A. System Implementation 
L Sub-band Processing 

25 f IG - 1 9 shows on e preferred embodiment of the present invention using 

sub-bands.where an adaptive filter driven from the sub-bands rather than the entire 
bandwidth of the input signal. Sub-bands result from partitioning a broader band in any 
manner as long as the subbands can be combined together so that the broader band can 
be reconstructed without distortions. One may use a so-called "perfect reconstruction 

30 structure" as known in the art to split the broadband into sub-bands and to combine the 
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sub-bands together substantially without distortion. For details on perfect 
reconstruction structures, see P.P. Vaidyanathan, Quadrature Mirror Filter Banks, M- 
Band Extensions and Perfect-Reconstruction Techniques, IEEE ASSP Magazine, pp. 4- 
20, July 1987. 

5 In the preferred embodiment, a broader band is partitioned into sub- 

bands, using several partitioning steps successively through intermediate bands. 
Broadband inputs from an array of sensors, 191 a- 1 91 d, are sampled at an appropriate 
sampling frequency and entered into a main-channel matrix 192 and a reference- 
channel matrix 193. The main-channel matrix generates a main channel, a signal 

10 received in the main looking direction of the sensor array, which contain a target signal 
component and an interference component. Fl, 194, and F2, 195 are splitters which 
first split the main channel into two intermediate bands, followed by down-sampling by 
two. Down-sampling is a well-known procedure in digital signal processing. Down- 
sampling by two, for example, is a process of sub-sampling by taking every other data 

15 point. Down-sampling is indicated by a downward arrow in the figure. Splitters F3, 
196 and F4, 197 further split the lower intermediate band into two sub-bands followed 
by down-sampling by two. 

In an example using a 16 Khz input signal, the result is a 0-4 Khz lower 
sub-band with 1/4 of the input sampling rate, a 4-8 Khz upper sub-band with 1/4 of the 

20 input sampling rate, and another upper 8-16 Khz intermediate band with 1/2 of the input 
sampling rate. 

The reference channels are processed in the same way by filters Fl, 198, 
and F2, 199, to provide only the lower sub-band with 1/4 of the input sampling rate, 
while the other sub-bands are discarded. 

25 The lower sub-bands of the reference channels are fed into an adaptive 

filter 1910, which generates canceling signals approximating interferences present the 
main channel. A subtracter 191 1 subtracts the canceling signals from the lower sub- 
band of the main channel to generate an output in the lower sub-band. The output is fed 
back to the adaptive filter for updating the filter weights. The adaptive filter processing 

30 and the subtraction is performed at the lower sampling rate appropriate for the lower 
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sub-band. At the same time the other upper bands of the main channel are delayed by 
delay units, 1912 and 1913, each by an appropriate time, to compensate for various 
delays caused by the different processing each sub-band is going through, and to 
synchronize them with the other sub-bands. The delay units can be implemented by a 
series of registers or a programmable delay. The output from the subtracter is combined 
with the other two sub-bands of the main channel through the reconstruction filters HI- 
H4, 1914-1917, to reconstruct a broadband output. H1-H4 may be designed such that 
they together with F1-F4 provide a theoretically perfect reconstruction without any 
distortions. 

Reconstructs H3 and H4 combine the lower and upper sub-bands into a 
low intermediate band, followed by an interpolation by two. An interpolation is a well- 
known procedure in digital signal processing. Interpolation by two, for example, is an 
up-sampling process increasing the number of samples by taking every other data point 
and interpolating them to fill as samples in between. Up-sampling is indicated by an 
upward arrow in the figure. The reconstructs HI, 1916 and H2, 1917 further combine 
the two intermediate bands into a broadband. 

In the preferred embodiment described, non-adaptive filter processing is 
performed in the upper sub-band of 194-1916 Khz. Adaptive filter processing is 
performed in the lower sub-band of 0-4 Khz where most of interferences are located. 
Since there is little computation overhead involved in the non-adaptive filter processing, 
the use of non-adaptive filter processing in the upper sub-band can reduce the 
computational burden significantly. The result is superior performance without an 
expensive increase in the required hardware. 
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2. Broadband Processing with Band-Limited Adaptation 

FIG. 20 shows another preferred embodiment using broadband 
processing with band-limited adaptation. Instead of using sub-band canceling signals 
5 which act on a sub-band main channel, the embodiment uses broadband canceling 

signals which act on a broadband main channel. But, since adaptive filter processing is 
done in a low-frequency domain, the resulting canceling signals are converted to a 
broadband signal so that it can be subtracted from the broadband main channel. 

As before, broadband inputs from an array of sensors, 2021a-2021d, are 
10 sampled at an appropriate sampling frequency and entered into a main-channel matrix 
2022 and a reference-channel matrix 2023. The main-channel matrix generates a main 
channel, a signal received in the main-looking direction, which has a target signal 
component and an interference component. The reference-channel matrix generates 
reference channels representing interferences received from all other directions. A low- 
15 pass filter 2025 filters the reference channels and down-samples them to provide low- 
frequency signals to an adaptive filter 2026. 

The adaptive filter 2026 acts on these low-frequency signals to generate 
low-frequency canceling signals which estimate a low- frequency portion of the 
interference component of the main channel. The low- frequency canceling signals are 
20 converted to broadband signals by an interpolator 2028 so that they can be subtracted 
from the main channel by a subtracter 2029 to produce a broadband output. 

The broadband output is low-pass filtered and down-sampled by a filter 
2024 to provide a low-frequency feedback signal to the adaptive filter 2026. In the 
mean time, the main channel is delayed by a delay unit 2027 to synchronize it with the 
25 canceling signals from the adaptive filter 2026. 



WO 01/31972 



PCTAJS00/29336 



27 

3 . Broadband Processinp with an Kytp m al Main-Ph^nel G*ner»tar 

FIG. 21 shows yet another preferred embodiment similar to the previous 
embodiment except that an external main-channel generator is used instead of a main- 
channel matrix to obtain a broadband main channel. This embodiment is useful when it 
is desired to take advantage of the broadband capabilities of commercially available hi- 
fi microphones. 

A broadband input is obtained by using an external main-channel 
generator, such as a shotgun microphone 2143 or a parabolic dish 2144. The broadband 
input is sampled through a high fidelity A-to-D converter 2145. The sampling rate 
should preferably be high enough to maintain the broad bandwidth and the audio quality 
of the external main-channel generator. 

A reference-channel matrix 2142 is used to obtain low-frequency 
reference channels representing interferences in the low-frequency domain. Since 
adaptive filter processing is done in the low-frequency domain, the reference-channel 
matrix does not need a broadband capability. 

A subtracter 2150 is used to subtract canceling signals estimating 
interferences from the broadband input. The broadband output is filtered by a low-pass 
filter 2146 which also performs down-sampling. The low-pass filtered output and the 
low-frequency reference channels are provided to an adaptive filter 2147. The adaptive 
filter acts on these low frequency signals to generate low-frequency canceling signals. 
In the meantime, the broadband input is delayed by a delay unit 2148 so that it can be 
synchronized with the canceling signals from the adaptive filter 2147. The delay unit 
can be implemented by a series of registers or by a programmable delay. The low- 
frequency canceling signals are converted to broadband canceling signals by an 
interpolator 2149 so that they can be subtracted from the broadband main channel to 
produce the broadband output. 
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B. Software Implementation 

The invention described herein may be implemented using a 
commercially available digital signal processor (DSP) such as Analog Device's 2100 
Series or any other general purpose microprocessor. For more information on Analog 
5 Device 2 1 00 Series, see Analog Device, ADSP-2 1 00 Family User's Manual, 3rd Ed., 
1995. 

1. Sub-Band Processing 

FIGS. 22A-22D are a flow chart depicting the operation of a program in 
accordance with the first preferred embodiment of the present invention using sub-band 
10 processing. 

Upon starting at step 22100, the program initializes registers and pointers 
as well as buffers (steps 221 10-22120). When a sampling unit sends an interrupt (step 
22131) that samples are ready, the program reads the sample values (step 22130), and 
stores them in memory (step 22140). 
15 The program retrieves the input values (step 22151) and main-channel 

matrix coefficients (step 22152) to generate a main channel by filtering the inputs 
values using the coefficients (step 22150), and then stores the result in memory (step 
22160). 

The program retrieves the input values (step 22171) and reference- 
20 channel matrix coefficients (step 22172) t9 generate a reference channel by filtering the 
input values using the coefficients (step 22170), and then store the result (step 22180). 
Steps 22170 and 221 80 are repeated to generate all other reference channels (step 
22190). 

The program retrieves the main channel (step 22201) and the Fl filter 
25 coefficients (step 22202) to generate an lower intermediate band with 1/2 of the 

sampling rate appropriate for the whole main channel by filtering the main channel with 
the coefficients and down-sampling the filtered output (step 22210), and then stores the 
result (step 22220). Similarly, the F2 filter coefficients are used to generate a upper 
intermediate band with 1/2 of the sampling rate (step 22240). The F3 and F3 filter 
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coefficients are used to further generate a lower sub-band with 1/4 of the sampling rate 
(step 22260) and a upper sub-band with 1/4 of the sampling rate (step 22280). 

The program retrieves one of the reference channels (step 22291) and the 
Fl filter coefficients (step 22292) to generate an intermediate band with 1/2 of the 
sampling rate by filtering the reference channel with the coefficients and down- 
sampling the filtered output (step 22290), and then stores the result (step 22300). 
Similarly, the F2 filter coefficients are used to generate a lower sub-band with 1/4 of the 
sampling rate (step 22320). Steps 22290-22320 are repeated for all the other reference 
channels (step 22330). 

The program retrieves the reference channels (step 22341) and the main 
channel (step 22342) to generate canceling signal using an adaptive beamforming 
process routine (step 22340). The program subtracts the canceling signals from the 
main channel to cancel the interference component in the main channel (step 22350). 

The program then interpolates the output from the adaptive beamforming 
process routine (step 22360) and filtering the output with the H3 filter coefficients (step 
22361) to obtain an up-sampled version (step 22370). The program also interpolates the 
main channel in the lower band (step 22380) and filters it with the H4 filter coefficients 
(step 22381) to obtain an up-sampled version (step 22390). The program combines the 
up-sampled versions to obtain a lower intermediate main channel (step 22400). 

The program interpolates the lower intermediate main channel (step 
22410) and filters it with the HI filter coefficients (step 22420) to obtain an up-sampled 
version (step 22420). The program also interpolates the upper intermediate main 
channel (step 22430) and filters it with the H2 filter coefficients (step 2243 1) to obtain 
an up-sampled version (step 22440). The program combines the up-sampled versions to 
obtain a broadband output (step 22450). 

2. Broadband Pro cessing w ith Freouenry .T Imhed Ari ap t a ri«„ 

FIGS. 23A-C are a flow chart depicting the operation of a program in 
accordance with the second preferred embodiment of the present invention using 
broadband processing with frequency-limited adaptation. 
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Upon starting at step 23500, the program initializes registers and pointers 
as well as buffers (steps 23510-23520). When a sampling unit sends an interrupt (step 
23531) that the samples are ready, the program reads the sample values (step 23530), 
and stores them in memory (step 23540). 
5 The program retrieves the broadband sample values (step 23551) and the 

main-channel matrix coefficients (step 23552) to generate a broadband main channel by 
filtering the broadband sample values with the coefficients (step 23550), and then stores 
the result in memory (step 23560). 

The program retrieves the broadband samples (step 23571) and 
10 reference-channel matrix coefficients (step 23572) to generate a broadband reference 
channel by filtering the samples using the coefficients (step 23570), and then stores the 
result (step 23580). Steps 23570 and 23580 are repeated to generate all the other 
reference channels (step 23590). 

The program retrieves the reference channels (step 23601) which are 
15 down-sampled (step 23602), the main channel (step 23603) which is also down-sampled 
to the low sampling rate (step 23604), and the low-frequency output (step 23605) to 
generate a low-frequency canceling signal (step 23600) using an adaptive beamforming 
process routine. The program updates the adaptive filter weights (step 23610) and 
interpolates the low-frequency canceling signal to generate a broadband canceling 
20 signal (step 23620). Steps 2361 0-23620 are repeated for all the other reference 
channels (step 23630). 

The program subtracts the canceling signals from the main channel to 
cancel the interference component in the main channel (step 23640). 

The program low-pass filters and interpolates the broadband output (step 
25 23650) so that the low-frequency output can fed back to update the adaptive filter 
weights. 

3. Broadband Processing with an External Main-Channel Generator 



WO 01/31972 



PCT/US00/29336 



31 

FIGS. 24A-24C are a flow chart depicting the operation of a program in 
accordance with the third preferred embodiment of the present invention using 
broadband processing with an external main-channel generator. 

Upon starting at step 24700, the program initializes registers and pointers 
as well as buffers (steps 24710-24720). When a sampling unit sends an interrupt (step 
24731) that samples are ready, the program reads the sample values (step 24730), and 
stores them in memory (step 24740). 

The program then reads a broadband input from the external main- 
channel generator (step 24750), and stores it as a main channel (step 24760). 

The program retrieves the low-frequency input (step 24771) and 
reference-channel matrix coefficients (step 24772) to generate a reference channel by 
multiplying the two (step 24770), and then stores the result (step 24780). Steps 24770 
and 24780 are repeated to generate all the other reference channels (step 24790). 

The program retrieves the low-frequency reference channels (step 
24801), the main channel (step 24802) which is down-sampled (step 24803), and a low- 
frequency output (step 24604) to generate low-frequency canceling signals (step 24600) 
using an adaptive beamforming process routine. The program updates the adaptive 
filter weights (step 24810) and interpolates the low-frequency canceling signal to 
generate the broadband canceling signal (step 24820). Steps 24810-24820 are repeated 
for all the other reference channels (step 24830). The program subtracts the 
broadband canceling signals from the broadband main channel to generate the 
broadband output with substantially reduced interferences (step 24840). 

The program low-pass filters and interpolates the broadband output (step 
24850) so that the low-frequency output can fed back to update the adaptive filter 
weights. 

FIG. 25 shows the functional blocks of a preferred embodiment in 
accordance with the present invention. The embodiment deals with finding the 
direction of a sound source, but the invention is not limited to such. It will be 
understood to those skilled in the art that the invention can be readily used for finding 
the direction of other wave sources such as an electromagnetic wave source. 
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The system includes an array of microphones 2501 that sense or measure 
sound from a particular sound source and that produce analog signals 2507 representing 
the measured sound. The analog signals 2507 are then sampled and converted to 
corresponding digital signals 2508 by an analog-to-digital (A-to-D) converter 2502. 
5 The digital signals 2508 are filtered by a band-pass filter 2503 so that the filtered 
signals 2509 contain only the frequencies in a specific bandwidth of interest for the 
purpose of determining the direction of the sound source. The filtered signals 2509 are 
then fed into an approximate-direction finder 2504 which calculates an approximate 
direction 2510 in terms of a microphone pair selected among the microphones. The 
10 precise-direction finder 5 estimates the precise-direction 251 1 of the sound source based 
on the approximate direction. The validity of the precise-direction 251 1 is checked by a 
measurement qualification unit 2506, which invalidates the precise direction if it does 
not satisfy a set of measurement criteria. Each functional block is explained in more 
detail below. 

15 

5.1. Microphone Array 

FIG. 26 shows an example of the array of microphones 1 that may used 
in accordance with the present invention. The microphones sense or measure the 
incident sound waves from a sound source and generate electronic signals (analog 
20 signals) representing the sound. The microphones may be omni, cardioid, or dipole 
microphones, or any combinations of such microphones. 
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The example shows a cylindrical structure 2621 with six microphones 
2622-27 mounted around its periphery, and an upper, center microphone 2628 mounted 
at the center of the upper surface of the structure. The upper, center microphone is 
optional, but its presence improves the accuracy of the precise direction, especially the 
elevation angle. Although the example shows the subset of microphones in 

a circular arrangement of the microphone array, the microphone array may take , 
variety of different geometries such as a linear array or a rectangular array. 



: on a 



5.2 A-to-D Converter 

The analog signals representing the sound sensed or measured by the 
microphones are converted to digital signals by the A-to-D converter 2502, which 
samples the analog signals at an appropriate sampling frequency. The converter may 
employ a well-known technique of sigma-delta sampling, which consists of 
oversampling and built-in low-pass filtering followed by decimation to avoid aliasing, a 
15 phenomenon due to inadequate sampling. 

When an analog signal is sampled, the sampling process creates a mirror 
representation of the original frequencies of the analog signal around the frequencies 
that are multiples of the sampling frequency. "Aliasing" refers to the situation where 
the analog signal contains information at frequencies above one half of the sampling 
frequency so that the reflected frequencies cross over the original frequencies, thereby 
distorting the original signal. In order to avoid aliasing, an analog signal should be 
sampled at a rate at least twice its maximum frequency component, known as the 
Nyquist frequency. 

In practice, a sampling frequency far greater than the Nyquist frequency 
is used to avoid aliasing problems with system noise and less-than-ideal filter responses. 
This oversampling is followed by low-pass filtering to cut off the frequency 
components above the maximum frequency component of the original analog signal. 
Once the digital signal is Nyquist limited, the rate must be reduced by decimation. If 
the oversampling frequency is n times the Nyquist frequency, the rate of the digital 
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signal after oversampling needs to be reduced by decimation, which takes one sample 
for every n samples input. 

An alternative approach to avoid aliasing is to limit the bandwidth of 
signals using an analog filter that halves the sampling frequency before the sampling 
5 process. This approach, however, would require an analog filter with a very sharp 
frequency cut-off characteristic. 

5.3 Bandpass Filter 

The purpose of the bandpass filter 2503 is to filter the signals sensed or 
10 measured by the microphones so that the filtered signals contain those frequencies 
optimal for detecting or determining the direction of the signals. Signals of too low a 
frequency do not produce enough phase difference at the microphones to accurately 
detect the direction. Signals of too high a frequency have less signal energy and are 
thus more subject to noise. By suppressing signals of the extreme high and low 
15 frequencies, the bandpass filter 2503 passes those signals of a specific bandwidth that 
can be further processed to detect or determine the direction of the sound source. The 
specific values of the bandwidth depends on the type of target wave source. If the 
source is a human speaker, the bandwidth may be between 300 Hz and 1500 Hz where 
typical speech signals have most of their energy concentrated. The bandwidth may also 
20 be changed by a calibration process, a trial-and-error process. Instead of using a fixed 
bandwidth during the operation, initially a certain bandwidth is tried. If too many 
measurement errors result, the bandwidth is adjusted to decrease the measurement 
errors so as to arrive at the optimal bandwidth. 
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5.4 Direction Estimation 

For efficiency of computation, the system first finds the approximate 
direction of the sound source, without the burden of heavy computation, and 
subsequently calculates the precise direction by using more computation power. The 
approximate direction is also used to determine the subset of microphones that are 
relevant to subsequent refinement of the approximate direction. In some configurations, 
some of the microphones may not have a line of sight to the source, and thus may create 
phase errors if they participate in further refinement of the approximate direction. 
Therefore, a subset of microphones are selected that would'be relevant to further 
refinement of the source direction. 



5.4.1 Approximate-Direction Finding 

FIG. 27 shows the approximate and exact direction finding units 2721, 

2722. 

FIG. 3 shows the approximate-direction finder 2821 in detail. It is based 
on the idea of specifying the approximate direction of the sound source in terms of a 
direction perpendicular to a pair of microphones. Let peripheral microphone pairs be 
the microphones located adjacent to each other around the periphery of the structure 
holding the microphones, except that a microphone located at the center of the structure, 
if any, are excluded. For each peripheral microphone pair, "pair direction" is defined 
as the direction in the horizontal plane, pointing from the center of the pair outward 
from the structure, perpendicular to the line connecting the peripheral microphone pair. 

. "Sector direction" is then defined as the pair direction closest to the 
source direction, selected among possible pair directions. If there are n pairs of 
peripheral microphones, there would be n candidates for the sector direction. 

The sector direction corresponding to the sound source is determined 
using a zero-delay cross-correlation. For each peripheral microphone pair, a correlation 
calculator 2831 calculates a zero-delay cross-correlation of two signals received from 
the microphone pair, X ; (t) and Xj(t). It is known to those skilled in the art that such a 
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zero-delay cross-correlation function, Ry(0), over a time period T can be defined by the 
following formula: 

T 

5 Rij(0)= Z XiCOXjO) 

t=0 

It is noted that a correlation calculator is well-known to those skilled in the art and may 

be available as an integrated circuit. Otherwise, it is well-known that such a correlation 

10 calculator can be built using discrete electronic components such as multipliers, adders, 

and shift registers. 

Among the peripheral microphone pairs, block 2832 finds the sector 
direction by selecting the microphone pair that produces the maximum correlation. 
Since the signals having the same or similar phase are correlated with each other, the 

15 . result is to find the pair with the same phase (equi-phase) or having the least phase 

difference. Since the plane of equi-phase is perpendicular to the propagation direction 
of the sound wave, the pair direction of the maximum correlation pair is, then, the sector 
direction, i.e., the pair direction closest to the source direction. 

Once the sector direction is found, block 2833 identifies the microphones 

20 that participate in further refinement of the approximate direction. "Sector" is defined 
as the subset of the microphones in the microphone array, which participate in 
calculating the precise direction of the sound source. For example, where some of the 
microphones in the array are blocked by a mechanical structure, the signals received by 
those microphones are not likely to be from direct-travelling waves, and thus such 

25 microphones should be excluded from the sector. 

In one preferred embodiment, the sector includes the maximum- 
correlation peripheral microphone pair, another peripheral microphone adjacent to the 
pair, and a center microphone, if any. Of two peripheral microphones adjacent to the 
maximum-correlation peripheral microphone pair, the one with a higher zero-delay 

30 cross-correlation is selected. The inclusion of the center microphone is optional, but the 
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inclusion helps to improve the accuracy of the source direction, because otherwise three 
adjacent microphones would be arranged almost in a straight line. There may be other 
ways of selecting the microphones to be included in the sector, and the information 
about such selection schemes may be stored in computer memory for an easy retrieval 
5 during the operation of the system. 

5.4.2 Precise-Direction Finding 

The precise-direction finder 5 (Fig. 29) calculates the precise direction of 
the sound source using a full cross-correlation. Block 2941 first identifies all possible 
combinations of microphone pairs within the sector. For each microphone pair 
identified, block 2942 calculates a full cross-correlation, Rjj(x), over a time period T 
using the follow formula, a well-known formula to those skilled in the art: 

T 

Ry(T) = Z Xi(t)Xj(t-T) 
t=0 

As mentioned before, a correlation calculator is well-known to those skilled in the art 
and may be available as an integrated circuit. Otherwise, it is well-known that such a 
correlation calculator can be built using discrete electronic components such as 
multipliers, adders, and shift registers. 

Rij(x) can be plotted as a cross-correlation curve. For each R*j(x) , block 

2943 finds the delay, t s corresponding to the peak point of the cross-correlation curve. 
Note that this peak-correlation delay x s lies at a sampling point. In reality, however, the 
maximum-correlation point may be located between sampling points. Therefore, block 

2944 calculates such maximum-correlation delay (which may be between sampling 
points), x d , by interpolating the cross-correlation function using a parabolic curve (y = p 
x 2 + q x + r) as follow: 

C(k-l)=Pk 2 + q(k-l) + r 
C(k) =Pk 2 + qk + r 

C(k+l)=Pk 2 + q(k+l) + r 
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By solving the above equation for p, q, and r, the maximum point is obtained by 

obtaining the derivative of the parabolic curve and setting the derivative of the equation 

to zero. The maximum point r d is - (l/2p), and is further expressed as follow: 

1 C(k-1)-C(k+1) 

5 x d = -(k + ( )) 

f s 2(C(k-l)-2C(k) + C(k+l)) 

where f s denotes the sampling frequency; k denotes the sampling point corresponding to 
x s ; and C(k) is the delay corresponding to sampling point k. The use of the 

10 interpolation technique improves the accuracy of the maximum-correlation delay, while 
eliminating the need for using a very high sampling rate. 

Since each maximum-correlation delay calculated for each microphone 
pair indicates the direction of the sound source measured by individual microphone 
pairs, the individual maximum-correlation delays are combined to estimate an average 

15 direction of the sound source. The estimation process provides a better indication of the 
source direction than each individual measured directions because it eliminates 
ambiguity problems inherent to each individual pair and provides a mechanism to verify 
the relevancy of the individual measurements by possibly eliminating those individual 
measurements that are far off from the source direction. 

20 Block 2945 calculates the precise direction of the sound source in terms 

of a vector in the Cartesian coordinates, K = (K x , K y , K 2 ), from the vector of individual 
measured delays, T d , by solving the linear equation between K and T d . The time delay 
between any two sensors is equal to the projection of the distance vector between them 
along the K vector divided by the sound velocity. Thus, the T d vector can be expressed 

25 as follows: 

T d = -(RK)/c 

where c is the speed of sound; R denotes the matrix representing the geometry of the 

microphone array in terms of position differences among the microphones as follows: 

[ X 2 -Xi, Y2-Y1, Z2-Z1 ] 
30 R= [... ] 

[ Xm-X|, Y m -Yi, Zm-Zj ] 
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Since the above equation is over-determined in that there are more 
constraints than the number of variables, the least-square (LS) method is used to obtain 
the optimal solution. Defining the error as the difference between the measured time 
delay vector and the evaluated time delay calculated, the error vector e is given by: 

8= (RK/c) + T d 

The solution depends on the co variance matrix A of the delay measurements which is 
defined by 

A = E{T d T d T } - E{T d }E{T d } T = COV {T d } 
where E{} denotes the expected value operator, and {*} T denotes the transpose of a 
matrix. The LS estimated solution, K, is then expressed in the following formula: 

K = -c(R T A l R)-'R T A' 1 T d 
= -cBT d 

where {*}"' denotes the inverse of a matrix. For derivation of the equation, see A. Gelb, 
Applied Optimal Estimation, the MIT Press, p. 103. 

Note that the B matrix depends only on the geometry of the microphone 
array, and thus can be computed off-line, without burdening the computation 
requirement during the direction determination. 

Block 2946 converts K into polar coordinates. FIG. 5 shows the 3- 
dimensional coordinate system used in the present invention. An azimuth angle, <(>, is 
defined as the angle of the source direction in the horizontal plane, measured clockwise 
from a reference horizonal direction (e.g. x-axis). An elevation angle, 0, is defined as 
the vertical angle of the source direction measured from the vertical axis (z-axis). 

Block 2946 calculates <|> and 0 from K x , K y , and K 2 by converting the 
Cartesian coordinates to the polar coordinates by solving the nonlinear equation 
between (K x , K y , K 2 ) and (<(», 0): 

[K x ] [ s in(0) cos(<|>) ] 

[K y ]= [sin(0)sin(<|O ] 

[K z ] [ cos(0)] 

In the case of a 3-dimensional microphone array (FIG. 30) (with the 
upper microphone), the above equation yields three non-linear equations with two 
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unknowns (<J>, 0). The problem is over-determined that there are more equations than 
the number of variables. The LS solution for (<|>, 0) has no close- form solution, but a 
suboptimal, closed-form, estimation can be found as: 
^ = tan" 1 (K y /K,) 

5 0 = tan* 1 (v/(K x 2 +K y 2 )/K 2 ) 

If a 2-dimensional microphone array were used (without the upper 
microphone), block 46 calculates <j> and 0 from K x and K y using the following formula: 
^tan'^Ky/Kx) 

0 = cosVl-(K x 2 +K y 2 )) 

10 Note that the algorithm can function even when the microphones are 

arranged in a 2-dimensional arrangement and still capable of resolving the azimuth and 
elevation. 



5.5 Measurement Qualification Unit 

15 When the precise-direction finder 2722 calculates the precise-direction 

of the sound source, the result may not reflect the true direction of the sound source due 
to various noise and measurement errors. The purpose of the measurement qualification 
unit 2505 is to evaluate the soundness or validity of the precise direction using a variety 
of measurement criteria and invalidate the measurements if the criteria are not satisfied. 

20 FIGS 3 1 a, 3 1 b, 3 1 c, and 3 1 d show different embodiments of the 

measurement qualification unit using a different measurement criterion. These 
embodiments may be used individually or in any combination. 

FIG. 31a shows a first embodiment of the qualification unit that uses a 
signal-to-noise ratio (SNR) as a measurement criterion. The SNR is defined as a ratio 

25 of a signal power to a noise power. To calculate the SNR, the measured signals are 
divided into blocks of signals having a predetermined period such as 40 milliseconds. 
Block 3161 calculates the signal power for each signal block by calculating the square- 
sum of the sampled signals within the block. The noise power can be measured in 
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many ways, but one convenient way of measuring the noise power may be to pick the 
signal power of the signal block having the minimum signal power and to use it as the 
noise power. Block 3 162 selects the signal block having the minimum power over a 
predetermined interval such as 2 second. Block 3163 calculates the SNR as the ratio of 
5 the signal power of the current block to that of the noise power. Block 3 164 invalidates 
the precise direction if the SNR is below a certain threshold. 

Fig. 31b shows a second embodiment of the measurement qualification 
unit that uses a spread (range of distribution) of individual measured delays as a 
measurement criterion. The precise source direction calculated by the precise-direction 
1 0 finder represents an average direction among the individual measured directions 
measured by microphone pairs in the sector. Since delays are directly related to 
direction angles, the spread of the individual measured delays with respect to the 
individual estimated delay indicates how widely the individual directions vary with 
respect to the precise direction. Thus, the spread gives a good indication as to the 
15 validity of the measurements. For example, if the individual measured delays are too 
widely spread, it is likely to indicate some kind of measurement error. 

T e is defined as a vector representing the set of individual estimated 
delays t c corresponding to the precise direction, K. Block 3171 calculates T e from K 
based on the linear relation between K and T e . 
T e = (-RK)/c 

where R denotes the position difference matrix representing the geometry of the 
microphone array as follows 

[Xj-Xj.Yj-YlZj-Z, ] 
R= f... ] 

[ X M -Xi, Y M -Yi, Zm-Z| ] 

and c is the propagation velocity of sound waves. 

Block 3172 compares the individual measured delays t d with the 
individual estimated delays t e and calculates the spread of individual measured delays 
using the following measure: 
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Ie 2 = Z(T d -T c ) 2 

If this spread exceeds a certain threshold, block 73 invalidates the precise source 
direction. 

Alternatively, the spread can be calculated directly from the individual 
5 measured delays using the following: 
E e 2 = E * Td 

where E = R (R T R)* 1 R T - 1; and I is the identity matrix. 

FIG. 31c shows a third embodiment of the measurement qualification 
unit that uses the azimuth angle, <|>, as a measurement criterion. If $ deviates 
10 significantly from the sector direction (the approximate source direction), it is likely to 
indicate that the precise direction is false. Therefore, if <|> is not within a permissible 
range of angles (e.g. within +/- 60 degrees) of the sector direction, the precise direction 
is invalidated. 

FIG. 3 Id shows a fourth embodiment of the measurement qualification 
15 unit that uses the elevation angle, 0, as a measurement criterion. If© deviates 

significantly from the horizontal direction (where 0 = 90°), it is likely to indicate the 
direction of reflected sound waves through the ceiling or the floor rather than that of 
direct sound waves. Therefore, if 0 is not within a range of allowable angles (e.g. from 
30° to 1 50°), the precise direction is invalidated. As mentioned before, the 

20 above embodiments can be used selectively or combined to produce a single quality 

figure of measurement, Q, which may be sent to a target system such as a controller for 
a videoconferencing system. For example, Q may be set to 0 if any of the error 
conditions above occurs and set to the SNR otherwise. 

The direction finding system of the present invention can be used in 
25 combination with a directional microphone system, which may include an adaptive 
filter. Such adaptive filter is not limited to a particular kind of adaptive filter. For 
example, one can practice the present invention in combination with the invention 
disclosed in applicant's commonly assigned and co-pending U.S. patent application 
Serial No. 08/672,899, filed June 27, 1996, entitled 'System and Method for Adaptive 



WO 01/31972 



PCT/US00/29336 



43 

Interference Canceling/ by inventor Joseph Marash and its corresponding PCT 
application WO 97/50186, published December 31, 1997. Both applications are 
incorporated by reference herein in their entirety. 

Specifically, the adaptive filter may include weight constraining means 
for truncating updated filter weight values to predetermined threshold values when each 
of the updated filter weight value exceeds the corresponding threshold value. The 
adaptive filter may further include inhibiting means for estimating the power of the 
main channel and the power of the reference channels and for generating an inhibit 
signal to the weight updating means based on normalized power difference between the 
main channel and the reference channels. 

The weight constraining means may include a frequency-selective 
weight-control unit, which includes a Fast Fourier Transform (FFT) unit for receiving 
adaptive filter weights and performing the FFT of the filer weights to obtain frequency 
representation values, a set of frequency bins for storing the frequency representation 
values divided into a set of frequency bands, a set of truncating units for comparing the 
frequency representation values with a threshold assigned to each bin and for truncating 
the values if they exceed the threshold, a set of storage cells for temporarily storing the 
truncated values, and an Inverse Fast Fourier Transform (EFFT) unit for converting 
them back to the adaptive filter weights. 

The adaptive filter in the directional microphone that may be used in 
combination with the present invention may also employ dual-processing interference 
canceling system where adaptive filter processing is used for a subset of a frequency 
range and fixed filter processing is used for another subset of the frequency range. For 
example, one can practice the present invention in combination with the invention 
disclosed in applicant's commonly assigned and co-pending U.S. patent application 
Serial No. 08/840,159, filed April 14, 1997, entitled 'Dual-Processing Interference 
Canceling System, 1 by inventor Joseph Marash and corresponding continuation-in-part 
application, filed April 8, 1997. Both applications are incorporated by reference herein 
in their entirety. 
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It is noted that the adaptive filter processing portion of the dual 
processing may also employ the adaptive filter processing disclosed in applicant's 
commonly assigned and co-pending U.S. patent application Serial No. 08/672,899, filed 
June 27, 1996, entitled 'System and Method for Adaptive Interference Canceling/ by 
5 inventor Joseph Marash and its corresponding PCT application WO 97/501 86, 
published December 31, 1997. 

5.6 Software Implementation 

The present invention described herein may be implemented using a 
10 commercially available digital signal processor (DSP) such as Analog Device's 2100 
Series or any other general purpose microprocessors. For more information on Analog 
Device 2100 Series, see Analog Device, ADSP-2100 Family User's Manual, 3rd Ed., 
1995. 

FIGS. 32A-32D show a flow chart depicting the operation of a program 
15 in accordance with a preferred embodiment of the present invention. The program uses 
measurement flags to indicate various error conditions. 

When the program starts (step 32100), it resets the system (step 32101) 
by resetting system variables including various measurement flags used for indicating 
error conditions. The program then reads into registers microphone inputs sampled at 
20 the sampling frequency of 64 KHz (step 32102), which is oversampling over the 
Nyquist frequency. As mentioned in Section 5.2, oversampling allows anti-aliasing 
filters to be realized with a much more gentle cut-off characteristic of a filter. Upon 
reading every 5 samples (step 32103), the program performs a low-pass filter operation 
and a decimation by taking one sample out of every 5 samples for each microphone 
25 (step 32104). The decimated samples are stored in the registers (step 32105). 

The program performs a bandpass filter operation on the decimated 
samples so that the output contains frequencies ranging from 1.5 to 2.5 KHz (step 
32106). The output is stored in input memory (step 32107). The program repeats the 
above procedure until 512 new samples are obtained (step 32108). 
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If the 512 news samples are reached, the program takes each pair of 
adjacent microphone pairs and multiples the received signals and add them to obtain 
zero-delay cross-correlation (step 32200), and the results are stored (step 32206). The 
calculation of zero-delay cross-correlation is repeated for all adjacent microphone pairs, 
5 not involving the center microphone (step 32201). 

The microphone pair having the highest zero-delay cross-correlation is 
selected (step 32202) and the value is stored as the signal power (step 32207), which 
will be used later. Of those two microphones adjacent to the selected pair, the program 
calculates the zero-correlation (step 32203) and the microphone having the higher 
10 correlation is selected (step 32204). The program determines the sector by including 
the selected microphone pair, the neighboring microphone selected, and the center 
microphone, if there is one. 

The program calculates the average power of the 512 samples taken from 
the center microphone (step 32300). The lowest average energy during the latest 2 
15 seconds is set to be the noise power (steps 32301-32305). 

The program calculates the full cross-correlation of signals received by 
each microphone pair in the sector (step 32306). The program finds the peak cross- 
correlation delay, t s , where the correlation is maximum (step 32307). 

t s lies on a sampling point, but the actual maximum-correlation delay, x d , 
20 may occur between two sampling points. If x s is either the maximum or minimum 
possible delay (step 32308), x d is set to t s (step 32309). Otherwise, the program finds 
the actual maximum-correlation delays using the parabolic interpolation formula 
described in Section 5.4.1 (steps 310-312). The above steps are repeated for all the 
microphone pairs in the sector (step 323 13). 
25 The program uses the B matrix mentioned in Section 5.4.2 to obtain the 

direction vector K = [K x , K y , K 2 ] from the set of time delays (step 32400). 

The program then calculates the azimuth angle, <j>, and the elevation 
angle, ©, corresponding to the direction vector obtained (step 32401). 
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The program calculates the SNR as the ratio of the signal power and the 
noise power (step 32402). If the SNR exceeds a threshold (step 32403), the program 
raises the SNR Flag (step 32404). 

The program then evaluates the elevation angle, 0. If© is not within a 
5 permissible range of angles (e.g. from 30° to 150°) (step 32405), the Elevation Flag is 
raised (step 32406). 

The program calculates corresponding delays from the precise direction 
(step 32407). The program calculates a delay spread as the sum of squares of the 
difference between the individual measured delays and the individual estimated delays 
10 (step 32408). If the delay spread exceeds a certain threshold (step 32409), the Delay 
Spread Flag is raised (step 32410). 

The program calculates the quality figure of measurement, Q, as a 
combination of all or part of the measurement criteria above (step 3241 1). For example, 
Q may be set to 0 if any of the measurement flags was raised and set to the SNR 
15 otherwise. 

The program transfers <|>, 0, and Q to a target system, such as an 
automatic camera tracking system used in a video conferencing application (step 
32412). The program resets the measurement flags (step 32413) and goes back to the 
beginning of the program (step 32414). 

20 While the invention has been described with reference to several 

preferred embodiments, it is not intended to be limited to those embodiments. It will be 
appreciated by those of ordinary skill in the art that many modifications can be made to 
the structure and form of the described embodiments without departing from the spirit 
and scope of the invention, which is defined and limited only in the following claims. 

25 For example, the present invention can be used to locate a direction of a source 
transmitting electromagnetic waves. 

SPECTRAL SUBTRACTION 
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The several embodiments of the present invention are practicable with 
spectral substration noise and echo acncellation and, in particular, integrated with the 
DSDA technology and incorporated in a keyboard. 

Figure 33A illustrates an embodiment of the present invention 33100. 
5 The system receives a digital audio signal at input 33102 sampled at a frequency which 
is at least twice the bandwidth of the audio signal. In one embodiment, the signal is 
derived from a microphone signal that has been processed through an analog front end, 
A/D converter and a decimation filter to obtain the required sampling frequency. In 
another embodiment, the input is taken from the output of a beamformer or even an 
10 adaptive beamformer. In that case the signal has been processed to eliminate noises 
arriving from directions other than the desired one leaving mainly noises originated 
from the same direction of the desired one. In yet another embodiment, the input signal 
can be obtained from a sound board when the processing is implemented on a PC 
processor or similar computer processor. 
15 The input samples are stored in a temporary buffer 33104 of 256 points. 

When the buffer is full, the new 256 points are combined in a combiner 33106 with the 
previous 256 points to provide 512 input points. The 512 input points are multiplied by 
multiplier 33108 with a shading window with the length of 512 points. The shading 
window contains coefficients that are multiplied with the input data accordingly. The 
20 shading window can be Harming or other and it serves two goals: the first is to smooth 
the transients between two processed blocks (together with the overlap process); the 
second is to reduce the side lobes in the frequency domain and hence prevent the 
masking of low energy tonals by high energy side lobes. The shaded results are 
converted to the frequency domain through an FFT (Fast Fourier Transform) processor 
25 3311 0. Other lengths of the FFT samples (and accordingly input buffers) are possible 
including 256 points or 1024 points. 

The FFT output is a complex vector of 256 significant points (the other 
256 points are an anti-symmetric replica of the first 256 points). The points are 
processed in the noise processing block 331 12 which includes the noise magnitude 
.30 estimation for each frequency bin - the subtraction process that estimates the noise-free 
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complex value for each frequency bin and the residual noise reduction process. An 
IFFT (Inverse East Fourier Transform) processor 331 14 performs the Inverse Fourier 
Transform on the complex noise free data to provide 512 time domain points. The first 
256 time domain points are summed by the summer 331 16 with the previous last 256 
5 data points to compensate for the input overlap and shading process and output at output 
terminal 331 18. The remaining 256 points are saved for the next iteration. 

It will be appreciated that, while specific transforms are utilized in the 
preferred embodiments, it is of course understood that other transforms may be applied 
to the present invention to obtain the spectral noise signal. 
10 Figure 33B is a detailed description of the noise processing block 33200. 

First, each frequency bin (n) 33202 magnitude is estimated. The straight forward 
approach is to estimate the magnitude by calculating: 

Y(n) =((Real(n)) 2 + (Imog(n)) 2 / 2 

15 

In order to save processing time and complexity the signal magnitude 
(Y) is estimated by an estimator 33204 using an approximation formula instead: 

Y(n) = Max[\Real(n)Jmag(n)\]+0.4* Minf\Real(n)Jmag(n)\J 

20 

In order to reduce the instability of the spectral estimation, which 
typically plagues the FFT Process (ref[2] Digital Signal Processing, Oppenheim 
Schafer, Prentice Hall P. 542545), the present invention implements a 2D smoothing 
process. Each bin is replaced with the average of its value and the two neighboring 

25 bins' value (of the same time frame) by a first averager 33206. In addition, the 

smoothed value of each smoothed bin is further smoothed by a second averager 33208 
using a time exponential average with a time constant of 0.7 (which is the equivalent of 
averaging over 3 time frames). The 2D-smoothed value is then used by two processes - 
the noise estimation process by noise estimation processor 33212 and the subtraction 

30 process by subtractor 33210. The noise estimation process estimates the noise at each 
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frequency bin and the result is used by the noise subtraction process. The output of the 
noise subtraction is fed into a residual noise reduction processor 33216 to further reduce 
the noise. In one embodiment, the time domain signal is also used by the residual noise 
processor 33216 to determine the speech free segments. The noise free signal is moved 
5 to the IFFT process to obtain the time domain output 33218. 

Figure 33C is a detailed description of the noise estimation processor 
33300. Theoretically, the noise should be estimated by taking a long time average of 
the signal magnitude (Y) of non-speech time intervals. This requires that a voice switch 
be used to detect the speech/non-speech intervals. However, a too-sensitive a switch 
10 may result in the use of a speech signal for the noise estimation which will defect the 
voice signal. A less sensitive switch, on the other hand, may dramatically reduce the 
length of the noise time intervals (especially in continuous speech cases) and defect the 
validity of the noise estimation. 

In the present invention, a separate adaptive threshold is implemented for 
15 each frequency bin 33302. This allows the location of noise elements for each bin 
separately without the examination of the overall signal energy. The logic behind this 
method is that, for each syllable, the energy may appear at different frequency bands. 
At the same time, other frequency bands may contain noise elements. It is therefore 
possible to apply a non-sensitive threshold for the noise and yet locate many non-speech 
20 data points for each bin, even within a continuous speech case. The advantage of this 
method is that it allows the collection of many noise segments for a good and stable 
estimation of the noise, even within continuous speech segments. 

In the threshold determination process, for each frequency bin, two 
minimum values are calculated. A future minimum value is initiated every 5 seconds at 
25 33304 with the value of the current magnitude (Y(n)) and replaced with a smaller 
minimal value over the next 5 seconds through the following process. The future 
minimum value of each bin is compared with the current magnitude value of the signal. 
If the current magnitude is smaller than the future minimum, the future minimum is 
replaced with the magnitude which becomes the new future minimum. 
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At the same time, a current minimum value is calculated at 33306. The 
current minimum is initiated every 5 seconds with the value of the future minimum that 
was determined over the previous 5 seconds and follows the minimum value of the 
signal for the next 5 seconds by comparing its value with the current magnitude value. 
5 The current minimum value is used by the subtraction process, while the future 
minimum is used for the initiation and refreshing of the current minimum. 

The noise estimation mechanism of the present invention ensures a tight 
and quick estimation of the noise value, with limited memory of the process (5 
seconds), while preventing a too high an estimation of the noise. 

10 Each bin's magnitude (Y(n)) is compared with four times the current 

minimum value of that bin by comparator 33308 - which serves as the adaptive 
threshold for that bin. If the magnitude is within the range (hence below the threshold), 
it is allowed as noise and used by an exponential averaging unit 33310 that determines 
the level of the noise 33312 of that frequency. If the magnitude is above the threshold it 

15 is rejected for the noise estimation. The time constant for the exponential averaging is 
typically 0.95 which may be interpreted as taking the average of the last 20 frames. The 
threshold of 4*minimum value may be changed for some applications. 

Figure 33D is a detailed description of the subtraction processor 33400. 
In a straight forward approach, the value of the estimated bin noise magnitude is 

20 subtracted from the current bin magnitude. The phase of the current bin is calculated 
and used in conjunction with the result of the subtraction to obtain the Real and 
Imaginary parts of the result. This approach is very expensive in terms of processing 
and memory because it requires the calculation of the Sine and Cosine arguments of the 
complex vector with consideration of the 4 quarters where the complex vector may be 

25 positioned. An alternative approach used in this present invention is to use a Filter 
approach. The subtraction is interpreted as a filter multiplication performed by filter 
33402 where H (the filter coefficient) is: 

H(n)= moizMm 

30 \Y(n)\ 



* 
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Where Y(n) is the magnitude of the current bin and N(n) is the noise 
estimation of that bin. The value H of the filter coefficient (of each bin separately) is 
multiplied by the Real and Imaginary parts of the current bin at 33404: 

E(Real) = Y (Real) *H ; E(Imag)=Y(Imag)*H 



Where E is the noise free complex value. In the straight forward 
approach the subtraction may result in a negative value of magnitude. This value can be 
either replaced with zero (half-wave rectification) or replaced with a positive value 
1 0 equal to the negative one (full-wave rectification). The filter approach, as expressed 
here, results in the full-wave rectification directly. The full wave rectification provides 
a little less noise reduction but introduces much less artifacts to the signal. It will be 
appreciated that this filter can be modified to effect a half- wave rectification by taking* 
the non-absolute value of the numerator and replacing negative values with zeros. 
15 Note also that the values of Y in the figures are the smoothed values of Y 

after averaging over neighboring spectral bins and over time frames (2D smoothing). 
Another approach is to use the smoothed Y only for the noise estimation (N), and to use 
the unsmoothed Y for the calculation of H. 

Figure 33E illustrates the residual noise reduction processor 33500. The 
20 residual noise is defined as the remaining noise during non-speech intervals. The noise 
in these intervals is first reduced by the subtraction process. which does not differentiate 
between speech and non-speech time intervals. The remaining residual noise can be 
reduced further by using a voice switch 33502 and either multiplying the residual noise 
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by a decaying factor or replacing it with zeros. Another alternative to the zeroing is 
replacing the residual noise with a minimum value of noise at 33504. 

Yet another approach, which avoids the voice switch, is illustrated in 
Figure 33F. The residual noise reduction processor 33506 applies a similar threshold 
5 used by the noise estimator at 33508 on the noise free output bin and replaces or decays 
the result when it is lower than the threshold at 335 10. The result of the 

residual noise processing of the present invention is a quieter sound in the non-speech 
intervals. However, the appearance of artifacts such as a pumping noise when the noise 
level is switched between the speech interval and the non-speech interval may occur in 

10 some applications. 

The spectral subtraction technique of the present invention can be 
utilized in conjunction with the array techniques, close talk microphone technique or as 
a stand alone system. The spectral subtraction of the present invention can be 
implemented on an embedded hardware (DSP) as a stand alone system, as part of other 

1 5 embedded algorithms such as adaptive beamforming, or as a software application 
running on a PC using data obtained from a sound port. 

As illustrated in Figures 33G-33J, for example, the present invention 
may be implemented as a software application. In step 33600, the input samples are 
read. At step 33602, the read samples are stored in a buffer. If 256 new points are 

20 accumulated in step 33604, program control advances to step 33606 - otherwise control 
returns to step 33600 where additional samples are read. Once 256 new samples are 
read, the last 512 points are moved to the processing buffer in step 33606. The 256 new 
samples stored are combined with the previous 256 points in step 33608 to obtain the 
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512 points. In step 33610, a Fourier Transform is performed on the 512 points. Of 
course, another transform may be employed to obtain the spectral noise signal. In step 
33612, the 256 significant complex points resulting from the transformation are stored 
in the buffer. The second 256 points are a conjugate replica of the first 256 points and 
5 are redundant for real inputs. The stored data in step 33614 includes the 256 real points 
and the 256 imaginary points. Next, control advances to Figure 33H as indicated by the 
circumscribed letter A. 

In Figure 33H, the noise processing is performed wherein the magnitude 
of the signal is estimated in step 33700. Of course, the straight forward approach may 
10 be employed but, as discussed with reference to Figure 33B, the straight forward 

approach requires extraneous processing time and complexity. In step 33702, the stored 
complex points are read from the buffer and calculated using the estimation equation 
shown in step 33700. The result is stored in step 33704. A 2-dimensional (2D) 
smoothing process is effected in steps 33706 and 33708 wherein, in step 33706, the 
1 5 estimate at each point is averaged with the estimates of adjacent points and, in step 
33708, the estimate is averaged using an exponential average having the effect of 
averaging the estimate at each point over, for example, 3 time samples of each bin. In 
steps 33710 and 33712, the smoothed estimate is employed to determine the future 
minimum value and the current minimum value. If the smoothed estimate is less than 
20 the calculated fiiture minimum value as determined in step 33710, the future minimum 
value is replaced with the smoothed estimate and stored in step 33714. 

Meanwhile, if it is determined at step 33712 that the smoothed estimate 
is less than the current minimum value, then the current minimum is replaced with the 
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smoothed estimate value and stored in step 33720. The future and current minimum 
values are calculated continuously and initiated periodically, for example, every 5 
seconds as determined in step 33724 and control is advanced to steps 33722 and 33726 
wherein the new future and current minimum are calculated. Afterwards, control 
5 advances to Figure 331 as indicated by the circumscribed letter B where the subtraction 
and residual noise reduction are effected. 

In Figure 331, it is determined whether the samples are less than a 
threshold amount in step 33800. In step 33804, where the samples are within the 
threshold, the samples undergo an exponential averaging and stored in the buffer at step 

10 33802. Otherwise, control advances directly to step 33808. At step 33808, the filter 
coefficients are determined from the signal samples retrieved in step 33806 the samples 
retrieved from step 33810 is determined from the signal samples retrieved in step 33806 
and the estimated samples retrieved from step 33810. Although the straight forward 
approach may be used by which phase is estimated and applied, the alternative Weiner 

1 5 Filter is preferred since this saves processing time and complexity. In step 33814, the 
filter transform is multiplied by the samples retrieved from steps 33816 and stored in 
step 33812. 

In steps 33818 and 33820, the residual noise reduction process is 
performed wherein, in step 33818, if the processed noise signal is within a threshold, 
20 control advances to step 33820 wherein the processed noise is subjected to replacement, 
for example, a decay. However, the residual noise reduction process may not be 
suitable in some applications where the application is negatively effected. 
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It will be appreciated that, while specific values are used as in the several 
equations and calculations employed in the present invention, these values may be 
different than those shown. 

In Figure 33J, the Inverse Fourier Transform is generated in step 902 on 
5 the basis of the recovered noise processed audio signal recovered in step 904 and stored 
in step 900. In step 906, the time-domain signals are overlayed in order to regenerate 
the audio signal substantially without noise. 

It will be appreciated that the present invention may be practiced as a 
software application, preferably written using C or any other programming language, 
10 which may be embedded on, for example, a programmable memory chip or stored on a 
computer-readable medium such as, for example, an optical disk, and retrieved 
therefrom to drive a computer processor. Sample code representative of the present 
invention is illustrated in Appendix A which, as will be appreciated by those skilled in 
the art, may be modified to accommodate various operating systems and compilers or to 
15 include various bells and whistles without departing from the spirit and scope of the 
present invention. 

With the present invention, a spectral subtraction system is provided that 
has a simple, yet efficient mechanism, to estimate the noise magnitude spectrum even in 
poor signal to noise ratio situations and in continuous fast speech cases. An efficient 
20 mechanism is provided that can perform the magnitude estimation with little cost, and 
will overcome the problem of phase association. A stable mechanism is provided to 
estimate the noise spectral magnitude without the smearing of the data. 
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Although preferred embodiments of the present invention and 
modifications thereof have been described in detail herein, it is to be understood that 
this invention is not limited to those precise embodiments and modifications, and that 
other modifications and variations may be affected by one skilled in the art without 
5 departing from the spirit and scope of the invention as defined by the appended claims. 
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WHAT IS CLAIMED IS : 

1 . A microphone array comprising: 

a number of microphone elements for independently receiving main and 
reference channel matrix audio signals corresponding to a main channel wherein a 
5 desired audio signal is received and a reference channel wherein a noise component of 
the desired audio signal is received; and 

a keyboard coupled to said array of microphones. 

2. The microphone array of claim 1, wherein said microphone 
elements are disposed adjacent a bottom surface of said keyboard such that said 

10 microphone elements are directed downward toward a supporting surface such that a 
pressure zone microphone effect is created thereby reducing acoustic reflections. 

3. The microphone array according to claim 2, wherein said 
microphone elements are arranged adjacent an edge of said keyboard. 

4. The microphone array of claim 1, wherein said microphone 

15 elements are individual microphones located at a number of corners of said keyboard. 

5. The microphone array of claim 4, wherein each corner of said 
keyboard incorporates a number of said independent microphone. 

6. The microphone array of claim 1, wherein said keyboard 
comprises a pop-up housing which houses said microphone elements in a structure 

20 which is substantially within said keyboard in a closed position and substantially above 
a surface of said keyboard in an open position for receiving said audio signals. 

7. The microphone array of claim 1 , wherein said keyboard 
comprises a raised surface for housing said microphone elements. 
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8. A microphone array comprising: 

a number of microphone elements for receiving respective audio signals 
corresponding to a main channel and a reference channel wherein said main channel 
receives a desired audio signal and said reference channel receives a noise component 
5 of said desired audio signal; and 

an elongated housing having a substantially flat profile for insertion 
between small gaps. 

9. The microphone array of claim 8, wherein said housing 
comprises holding means for holding said housing to a portion of an automobile. 

10 10. The microphone array of claim 9, wherein said portion of the 

automobile is a visor. 

1 1 . The microphone array of claim 10, wherein a small gap is formed 
between said housing and a roof of said automobile thereby creating a pressure zone 
microphone effect for reducing acoustic reflections in said automobile. 
15 12. A microphone array comprising: 

a number of microphone elements for independently receiving main and 
reference channel matrix audio signals corresponding to a main channel wherein a 
desired audio signal is received and a reference channel wherein a noise component of 
the desired audio signal is received; 
20 a rear view mirror of an automobile having a reflective portion bounded 

by a perimeter, wherein said microphone elements are disposed along said perimeter of 
said rear view mirror. 

13. A microphone array comprising: 
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a number of microphone elements for independently receiving main and 
reference channel matrix audio signals corresponding to a main channel wherein a 
desired audio signal is received and a reference channel wherein a noise component of a 
desired audio signal is received; and 
5 a mouse peripheral for use with a computer for housing said microphone 

elements. 

14. A microphone array comprising: 

a number of microphone elements for independently receiving main and 
reference channel matrix audio signals corresponding to a main channel wherein a 
10 desired audio signal is received and a reference channel wherein a noise component of 
the desired audio signal is received; and 

a video camera for housing said microphone elements. 

15., A microphone array comprising: 

a number of microphone elements for independently receiving main and 
15 reference channel matrix audio signals corresponding to a main channel wherein a 

desired audio signal is received and a reference channel wherein a noise component of 
the desired audio signal is received; and 

a universal voice interface for interfacing the desired audio signal and 
the noise component to a computer processor. 
20 16. The microphone array according to claim 15, wherein said 

universal voice interface includes converting means for converting said desired audio 
signal and said noise component to digital form. 
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1 7. The microphone array according to claim 1 6, wherein said 
universal voice interface is incorporated within a rear view mirror of an automobile. 

18. The microphone array according to claim 1 6, wherein said 
universal voice interface provides audio noise canceling of said noise component. 

5 19. The microphone array according to claim 16, wherein said 

universal voice interface is incorporated in a port plug for a standard computer. 

20. The microphone array according to claim 19, wherein said 
microphone elements are disposed along a perimeter of a computer monitor. 

2 1 . The microphone array comprising: 

10 a number of microphone elements for independently receiving main and 

reference channel matrix audio signals corresponding to a main channel wherein a 
desired audio signal is received and a reference channel wherein a noise component of 
the desired audio signal is received; and 

a noise canceling stethoscope for housing said microphone elements. 

15 22. A microphone array comprising: 

a number of microphone elements for independently receiving main and 
referenced channel matrix audio signals corresponding to a main channel wherein a 
desired audio signal is received and a reference channel wherein a noise component of 
the desired audio signal is received; and 

20 an ultrasound device for housing said microphone elements. 



23. The microphone array according to claim 1 , further comprising: 
an input for inputting an audio signal which includes a noise signal; 
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a frequency spectrum generator for generating the frequency spectrum of 
said audio signal thereby generating frequency bins of said audio signal; and 

a threshold detector for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
5 elements for each frequency bin. 

24. The microphone array according to claim 8, further comprising: 
an input for inputting an audio signal which includes a noise signal; 

a frequency spectrum generator for generating the frequency spectrum of 
said audio signal thereby generating frequency bins of said audio signal; and 
10 a threshold detector for detecting for each frequency bin whether a 

respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

25. The microphone array according to claim 12, further comprising: 
an input for inputting an audio signal which includes a noise signal; 

1 5 a frequency spectrum generator for generating the frequency spectrum of 

said audio signal thereby generating frequency bins of said audio signal; and 

a threshold detector for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 
20 26. The microphone array according to claim 1 3, further comprising: 

an input for inputting an audio signal which includes a noise signal; 
a frequency spectrum generator for generating the frequency spectrum of 
said audio signal thereby generating frequency bins of said audio signal; and 
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a threshold detector for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

27. The microphone array according to claim 14, further comprising: 
5 an input for inputting an audio signal which includes a noise signal; 

a frequency spectrum generator for generating the frequency spectrum of 
said audio signal thereby generating frequency bins of said audio signal; and 

a threshold detector for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
10 elements for each frequency bin. 

28. The microphone array according to claim 15, further comprising: 
an input for inputting an audio signal which includes a noise signal; 

a frequency spectrum generator for generating the frequency spectrum of 
said audio signal thereby generating frequency bins of said audio signal; and 
1 5 a threshold detector for detecting for each frequency bin whether a 

respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

29. The microphone array according to claim 21, further comprising: 
an input for inputting an audio signal which includes a noise signal; 

20 a frequency spectrum generator for generating the frequency spectrum of 

said audio signal thereby generating frequency bins of said audio signal; and 
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a threshold detector for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

30. The microphone array according to claim 22, further comprising: 
5 an input for inputting an audio signal which includes a noise signal; 

a frequency spectrum generator for generating the frequency spectrum of 
said audio signal thereby generating frequency bins of said audio signal; and 

a threshold detector for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
10 elements for each frequency bin. 

3 1 . The microphone array according to claim 1 , further comprising: 
input means for inputting an audio signal which includes a noise signal; 
frequency spectrum generating means for generating the frequency 

spectrum of said audio signal thereby generating frequency bins of said audio signal; 
15 and 

threshold detecting means for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

32. The microphone array according to claim 8, further comprising: 

2 0 * n P ut means for inputting an audio signal which includes a noise signal; 

frequency spectrum generating means for generating the frequency 
spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 
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threshold detecting means for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

33. The microphone array according to claim 12, further comprising: 

5 input means for inputting an audio signal which includes a noise signal; 

frequency spectrum generating means for generating the frequency 
spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 

threshold detecting means for detecting for each frequency bin whether a 
10 respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

34. The microphone array according to claim 13, further comprising: 
input means for inputting an audio signal which includes a noise signal; 
frequency spectrum generating means for generating the frequency 

15 spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 

. threshold detecting means for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 
20 35. The microphone array according to claim 14, further comprising: 

input means for inputting an audio signal which includes a noise signal; 
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frequency spectrum generating means for generating the frequency 
spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 

threshold detecting means for detecting for each frequency bin whether a 
5 respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

36. The microphone array according to claim 15, further comprising: 
input means for inputting an audio signal which includes a noise signal; 
frequency spectrum generating means for generating the frequency 
10 spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 

threshold detecting means for detecting for each frequency bin whether a 
respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 
15 37 - The microphone anray according to claim 2 1 , further comprising: 

input means for inputting an audio signal which includes a noise signal; 

frequency spectrum generating means for generating the frequency 
spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 

50 threshold detecting means for detecting for each frequency bin whether a 

respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

38. The microphone array according to claim 22, further comprising: 
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input means for inputting an audio signal which includes a noise signal; 

frequency spectrum generating means for generating the frequency 
spectrum of said audio signal thereby generating frequency bins of said audio signal; 
and 

5 threshold detecting means for detecting for each frequency bin whether a 

respective frequency bin is within said threshold thereby detecting the position of noise 
elements for each frequency bin. 

39. A handheld digital assistant, comprising: 

a body for housing a processing means and a display; and 
10 a microphone array integral to said body and including a number of 

microphone elements for independently receiving main and reference channel matrix 
audio signals corresponding to a main channel wherein a desired audio signal is 
received and a reference channel wherein a noise component of the desired audio signal 
is received; 

15 wherein said processing means is operable to receive microphone signals 

generated by said array and perform tasks based on said microphone signals. 

40. The handheld digital assistant of claim 39, wherein said 
microphone array is selectively operable in a near field audio reception mode and a far 
field audio reception mode. 

20 4 1 . A personal computer system, comprising: 

a computer body for housing a processing means and a monitor; 
a microphone array including a number of microphone elements for 
independently receiving main and reference channel matrix audio signals corresponding 



WO 01/31972 



PCT/US00/29336 



67 

to a main channel wherein a desired audio signal is received and a reference channel 
wherein a noise component of the desired audio signal is received; and 

means for coupling said microphone array to said computer body; 

wherein said processing means is operable to receive microphone signals 
5 generated by said array and perform tasks based on said microphone signals. 

42. The personal computer system of claim 4 1 , wherein said 
microphone array is integrated into said monitor. 

43. The personal computer system of claim 41 , wherein said 
microphone array has four microphones, said means for coupling said microphone array 

10 to said computer body is two dual channel audio lines, and each microphone has its 
signals transmitted to the computer body via a dedicated one of said channels. 
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